Grub-install, superblock corrupted/erased and other animals

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Grub-install, superblock corrupted/erased and other animals
@ 2011-07-27 12:16 Aaron Scheiner
  2011-08-02  6:39 ` NeilBrown
  2011-08-03  7:13 ` Stan Hoeppner
  0 siblings, 2 replies; 16+ messages in thread
From: Aaron Scheiner @ 2011-07-27 12:16 UTC (permalink / raw)
  To: linux-raid

Hi

My original message was sent (and rejected) 4 days ago because it was
in HTML (whoops). Here's my original e-mail with updates :

I've got an Ubuntu machine with an mdadm RAID array on level 6. The
RAID-6 array consists of 10 drives and has been running for about 3
years now. I recently upgraded the drives in the array from 1TB units
to 2TB units.

The drive on which the OS sat died a few days ago, so I installed a
new OS drive and then installed Ubuntu Server on it.
On reboot the machine hung on a black screen with a white flashing
cursor. So I went back into the Ubuntu Setup and installed grub to all
of the drives in the raid array (except two) [wow, this was such a
stupid move].

I then rebooted the machine and it successfully booted into Ubuntu
Server. I set about restoring the configuration for the raid array...
only to be given the message "No Superblock Found" (more or less).
Each element in the array was used directly by mdadm (so /dev/sda, not
/dev/sda1).

I see that the superblock is stored within the MBR region on the drive
(which is 512bytes from the start of the disk), which would explain
why the superblocks were destroyed. I haven't been able to find
anything regarding a backup superblock (does something like this
exist?).

I have now started using a script that tries to re-create the array by
running through the various permutations available... it takes roughly
2.5 seconds per permutation/iteration and there are just over 40 000
possibilities. The script tests for a valid array by trying to mount
the array as read only (it's an XFS file system). I somehow doubt that
it will mount even when the correct combination of disks is found.
[UPDATE] : It never mounted.

So... I have an idea... The array has a hole bunch of movie files on
it and I have exact copies of some of them on another raid array. So I
was thinking that if I searched for the start of one of those files on
the scrambled array, I could work out the order of the disks by
searching forward until I found a change. I could then compare the
changed area (probably 128KB/the chunk size forward) with the file I
have and see where that chunk lies in the file, thereby working out
the order.
[UPDATE] : Seeing as the array never mounted, I have proceeded with
this idea. I took samples of the start of the video file and provided
them to Scalpel as needles for recovery. After two days of searching,
Scalpel located the starts of the various segments in the raid array
(I re-created the raid array with the drives in random order). I then
carved (using dd) 3MBs out of the raid array that contains all the
samples handed to scalpel originally (plus a bit more).

So now I have to find segments of the start of the intact file in the
carved out data from the raid array.

It would be really useful if I knew the layout of the array :

If the chunk size of the array is 128KB, does that mean that the file
I carved will be divided up into segments of contiguous data, each
128KBs in length ? or does it mean that the length of contiguous data
will be 16KB ( 128 KB / 8 drives ) ?

Do these segments follow on from each other without interruption or is
there some other data in-between (like metadata? I'm not sure where
that resides).

Any explanation of the structure of a raid6 array would be greatly
appreciated, as well as any other advice (tools, tips, etc).

Thanks :)

Aaron (24)
South Africa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-07-27 12:16 Grub-install, superblock corrupted/erased and other animals Aaron Scheiner
@ 2011-08-02  6:39 ` NeilBrown
  2011-08-02  8:01   ` Stan Hoeppner
  2011-08-02 16:16   ` Aaron Scheiner
  2011-08-03  7:13 ` Stan Hoeppner
  1 sibling, 2 replies; 16+ messages in thread
From: NeilBrown @ 2011-08-02  6:39 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: linux-raid

On Wed, 27 Jul 2011 14:16:52 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:

> Hi
> 
> My original message was sent (and rejected) 4 days ago because it was
> in HTML (whoops). Here's my original e-mail with updates :
> 
> I've got an Ubuntu machine with an mdadm RAID array on level 6. The
> RAID-6 array consists of 10 drives and has been running for about 3
> years now. I recently upgraded the drives in the array from 1TB units
> to 2TB units.
> 
> The drive on which the OS sat died a few days ago, so I installed a
> new OS drive and then installed Ubuntu Server on it.
> On reboot the machine hung on a black screen with a white flashing
> cursor. So I went back into the Ubuntu Setup and installed grub to all
> of the drives in the raid array (except two) [wow, this was such a
> stupid move].

So you still have two with valid superblocks?  (or did before you started
re-creating).  Do you have a copy of the "mdadm --examine" of those?  It
would be helpful.

> 
> I then rebooted the machine and it successfully booted into Ubuntu
> Server. I set about restoring the configuration for the raid array...
> only to be given the message "No Superblock Found" (more or less).
> Each element in the array was used directly by mdadm (so /dev/sda, not
> /dev/sda1).
> 
> I see that the superblock is stored within the MBR region on the drive
> (which is 512bytes from the start of the disk), which would explain
> why the superblocks were destroyed. I haven't been able to find
> anything regarding a backup superblock (does something like this
> exist?).
> 
> I have now started using a script that tries to re-create the array by
> running through the various permutations available... it takes roughly
> 2.5 seconds per permutation/iteration and there are just over 40 000
> possibilities. The script tests for a valid array by trying to mount
> the array as read only (it's an XFS file system). I somehow doubt that
> it will mount even when the correct combination of disks is found.
> [UPDATE] : It never mounted.

Odd...  Possibly you have a newer mdadm which uses a different "Data
offset".  The "mdadm --examine" of the 2 drives that didn't get corrupted
would help confirm that.

> 
> So... I have an idea... The array has a hole bunch of movie files on
> it and I have exact copies of some of them on another raid array. So I
> was thinking that if I searched for the start of one of those files on
> the scrambled array, I could work out the order of the disks by
> searching forward until I found a change. I could then compare the
> changed area (probably 128KB/the chunk size forward) with the file I
> have and see where that chunk lies in the file, thereby working out
> the order.
> [UPDATE] : Seeing as the array never mounted, I have proceeded with
> this idea. I took samples of the start of the video file and provided
> them to Scalpel as needles for recovery. After two days of searching,
> Scalpel located the starts of the various segments in the raid array
> (I re-created the raid array with the drives in random order). I then
> carved (using dd) 3MBs out of the raid array that contains all the
> samples handed to scalpel originally (plus a bit more).
> 
> So now I have to find segments of the start of the intact file in the
> carved out data from the raid array.
> 
> It would be really useful if I knew the layout of the array :
> 
> If the chunk size of the array is 128KB, does that mean that the file
> I carved will be divided up into segments of contiguous data, each
> 128KBs in length ? or does it mean that the length of contiguous data
> will be 16KB ( 128 KB / 8 drives ) ?

128KB per device, so the first alternative.

> 
> Do these segments follow on from each other without interruption or is
> there some other data in-between (like metadata? I'm not sure where
> that resides).

That depends on how XFS lays out the data.  It will probably be mostly
contiguous, but no guarantees.


> 
> Any explanation of the structure of a raid6 array would be greatly
> appreciated, as well as any other advice (tools, tips, etc).

The stripes start "Data Offset" from the beginning of the device, which I
think is 1MB with recent mdadm, but something like 64K with earlier mdadm.

The first few stripes are:

 Q  0  1  2  3  4  5  6  7  P
 8  9 10 11 12 13 14 15  P  Q
17 18 19 20 21 22 23  P  Q 16

Where 'P' is xor parity, Q is GF parity, and N is chunk number N.  Each chunk
is 128KB (if that is your chunk size).

This pattern repeats after 10 stripes.

good luck,
NeilBrown


> 
> Thanks :)
> 
> Aaron (24)
> South Africa
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-02  6:39 ` NeilBrown
@ 2011-08-02  8:01   ` Stan Hoeppner
  2011-08-02 16:24     ` Aaron Scheiner
  2011-08-02 16:16   ` Aaron Scheiner
  1 sibling, 1 reply; 16+ messages in thread
From: Stan Hoeppner @ 2011-08-02  8:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: Aaron Scheiner, linux-raid

On 8/2/2011 1:39 AM, NeilBrown wrote:
> On Wed, 27 Jul 2011 14:16:52 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:

>> Do these segments follow on from each other without interruption or is
>> there some other data in-between (like metadata? I'm not sure where
>> that resides).
> 
> That depends on how XFS lays out the data.  It will probably be mostly
> contiguous, but no guarantees.

Looks like he's still under the 16TB limit (8*2TB drives) so this is an
'inode32' XFS filesystem.  inode32 and inoe64 have very different
allocation behavior.  I'll take a stab at an answer, and though the
following is not "short" by any means, it's not nearly long enough to
fully explain how XFS lays out data on disk.

With inode32, all inodes (metadata) are stored in the first allocation
group, maximum 1TB, with file extents in the remaining AGs.  When the
original array was created (and this depends a bit on how old his
kernel/xfs module/xfsprogs are) mkfs.xfs would have queried mdraid for
the existence of a stripe layout.  If found, mkfs.xfs would have created
16 allocation groups of 500GB each, the first 500GB AG being reserved
for inodes.  inode32 writes all inodes to the first AG and distributes
files fairly evenly across top level directories in the remaining 15 AGs.

This allocation parallelism is driven by directory count.  The more top
level directories the greater the filesystem write parallelism.  inode64
is much better as inodes are spread across all AGs instead of being
limited to the first AG, giving metadata heavy workloads a boost (e.g.
maildir).  inode32 filesystems are limited to 16TB in size, while
inode64 is limited to 16 exabytes.  inode64 requires a fully 64 bit
Linux operating system, and though inode64 scales far beyond 16TB, one
can use inode64 on much smaller filesystems for the added benefits.

This allocation behavior is what allows XFS to have high performance
with large files as free space management within and across multiple
allocation groups keeps file fragmentation to a minimum.  Thus, there
are normally large spans of free space between AGs, on a partially
populated XFS filesystem.

So, to answer the question, if I understood it correctly, there will
indeed be data spread all over all of the disks with large free space
chunks in between.  The pattern of files on disk will not be contiguous.
 Again, this is by design, and yields superior performance for large
file workloads, the design goal of XFS.  It doesn't do horribly bad with
many small file workloads either.

-- 
Stan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-02  8:01   ` Stan Hoeppner
@ 2011-08-02 16:24     ` Aaron Scheiner
  2011-08-02 16:41       ` Stan Hoeppner
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Scheiner @ 2011-08-02 16:24 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: NeilBrown, linux-raid

wow... I had no idea XFS was that complex, great for performance,
horrible for file recovery :P . Thanks for the explanation.

Based on this the scalpel+lots of samples approach might not work...
I'll investigate XFS a little more closely, I just assumed it would
write big files in one continuous block.

This makes a lot of sense; I reconstructed/re-created the array using
a random drive order, scalpel'ed the md device for the start of the
video file and found it. I then dd'ed that out to a file on the hard
drive and then loaded that into a hex editor. The file ended abruptly
after about +/-384KBs. I couldn't find any other data belonging to the
file within 50MBs around the the sample scalpel had found.

Thanks again for the info.

On Tue, Aug 2, 2011 at 10:01 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 8/2/2011 1:39 AM, NeilBrown wrote:
>> On Wed, 27 Jul 2011 14:16:52 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:
>
>>> Do these segments follow on from each other without interruption or is
>>> there some other data in-between (like metadata? I'm not sure where
>>> that resides).
>>
>> That depends on how XFS lays out the data.  It will probably be mostly
>> contiguous, but no guarantees.
>
> Looks like he's still under the 16TB limit (8*2TB drives) so this is an
> 'inode32' XFS filesystem.  inode32 and inoe64 have very different
> allocation behavior.  I'll take a stab at an answer, and though the
> following is not "short" by any means, it's not nearly long enough to
> fully explain how XFS lays out data on disk.
>
> With inode32, all inodes (metadata) are stored in the first allocation
> group, maximum 1TB, with file extents in the remaining AGs.  When the
> original array was created (and this depends a bit on how old his
> kernel/xfs module/xfsprogs are) mkfs.xfs would have queried mdraid for
> the existence of a stripe layout.  If found, mkfs.xfs would have created
> 16 allocation groups of 500GB each, the first 500GB AG being reserved
> for inodes.  inode32 writes all inodes to the first AG and distributes
> files fairly evenly across top level directories in the remaining 15 AGs.
>
> This allocation parallelism is driven by directory count.  The more top
> level directories the greater the filesystem write parallelism.  inode64
> is much better as inodes are spread across all AGs instead of being
> limited to the first AG, giving metadata heavy workloads a boost (e.g.
> maildir).  inode32 filesystems are limited to 16TB in size, while
> inode64 is limited to 16 exabytes.  inode64 requires a fully 64 bit
> Linux operating system, and though inode64 scales far beyond 16TB, one
> can use inode64 on much smaller filesystems for the added benefits.
>
> This allocation behavior is what allows XFS to have high performance
> with large files as free space management within and across multiple
> allocation groups keeps file fragmentation to a minimum.  Thus, there
> are normally large spans of free space between AGs, on a partially
> populated XFS filesystem.
>
> So, to answer the question, if I understood it correctly, there will
> indeed be data spread all over all of the disks with large free space
> chunks in between.  The pattern of files on disk will not be contiguous.
>  Again, this is by design, and yields superior performance for large
> file workloads, the design goal of XFS.  It doesn't do horribly bad with
> many small file workloads either.
>
> --
> Stan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-02 16:24     ` Aaron Scheiner
@ 2011-08-02 16:41       ` Stan Hoeppner
  2011-08-02 21:13         ` Aaron Scheiner
  0 siblings, 1 reply; 16+ messages in thread
From: Stan Hoeppner @ 2011-08-02 16:41 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: NeilBrown, linux-raid

On 8/2/2011 11:24 AM, Aaron Scheiner wrote:
> wow... I had no idea XFS was that complex, great for performance,
> horrible for file recovery :P . Thanks for the explanation.
> 
> Based on this the scalpel+lots of samples approach might not work...
> I'll investigate XFS a little more closely, I just assumed it would
> write big files in one continuous block.

Maybe I didn't completely understand what you're trying to do...

As long as there is enough free space within an AG, any newly created
file will be written contiguously (from the FS POV).  If you have 15
extent AGs and write 30 of these files, 2 will be written into each AG.
 There will be lots of free space between the last file in AG2 and AG3,
on down the line.  When I said the data would not be contiguous, I was
referring to the overall composition of the filesystem, not individual
files.  Depending on their size, individual files will be broken up
into, what, 128KB chunks, and spread across the 8 disk stripe, by mdraid.

> This makes a lot of sense; I reconstructed/re-created the array using
> a random drive order, scalpel'ed the md device for the start of the
> video file and found it. I then dd'ed that out to a file on the hard
> drive and then loaded that into a hex editor. The file ended abruptly
> after about +/-384KBs. I couldn't find any other data belonging to the
> file within 50MBs around the the sample scalpel had found.

What is the original size of this video file?

> Thanks again for the info.

Sure thing.  Hope you get it going.

-- 
Stan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-02 16:41       ` Stan Hoeppner
@ 2011-08-02 21:13         ` Aaron Scheiner
  2011-08-03  4:02           ` Stan Hoeppner
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Scheiner @ 2011-08-02 21:13 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: NeilBrown, linux-raid

Oh right, I see, my mistake.

The file is just one of a set of files that I duplicated across two
arrays. The entire folder (with almost all the duplicated files in it)
was approximately 2TBs in size. The file I'm using for comparison is
11GBs in size.
The array was originally 8TBs in size, but I upgraded it recently (May
2011) to 16TBs (using 2TB drives). As part of the upgrade process I
copied all the data from the older array to the new array in one large
cp command.I expect this would have had the effect of defragmenting
the files... which is great seeing as I'm relying on low fragmentation
for this process :P .

So there's a good chance then that searching on all the drives for
512-byte samples from various points in the "example" file will allow
me to work out the order of the drives.

Scalpel is 70% through the first drive. Scans of both the first and
second drives should be complete by tomorrow morning (my time) yay :)
.

Just of interest; machines on a Gigabit LAN used to be able to read
data off the array at around 60MB/sec... which I was very happy with.
Since the upgrade to 2TB drives the array has been reading at over
100MB/sec, saturating the ethernet interface. Do you think the new
drives are the reason for the speed increase ? (the new drives are
cheap Seagate 5900 rpm drives "Green Power", the old drives were
Samsung 7200 rpm units) or do you think the switch from JFS to XFS
(and aligning partitions with cylinder boundaries) may have been part
of it ?

On Tue, Aug 2, 2011 at 6:41 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 8/2/2011 11:24 AM, Aaron Scheiner wrote:
>> wow... I had no idea XFS was that complex, great for performance,
>> horrible for file recovery :P . Thanks for the explanation.
>>
>> Based on this the scalpel+lots of samples approach might not work...
>> I'll investigate XFS a little more closely, I just assumed it would
>> write big files in one continuous block.
>
> Maybe I didn't completely understand what you're trying to do...
>
> As long as there is enough free space within an AG, any newly created
> file will be written contiguously (from the FS POV).  If you have 15
> extent AGs and write 30 of these files, 2 will be written into each AG.
>  There will be lots of free space between the last file in AG2 and AG3,
> on down the line.  When I said the data would not be contiguous, I was
> referring to the overall composition of the filesystem, not individual
> files.  Depending on their size, individual files will be broken up
> into, what, 128KB chunks, and spread across the 8 disk stripe, by mdraid.
>
>> This makes a lot of sense; I reconstructed/re-created the array using
>> a random drive order, scalpel'ed the md device for the start of the
>> video file and found it. I then dd'ed that out to a file on the hard
>> drive and then loaded that into a hex editor. The file ended abruptly
>> after about +/-384KBs. I couldn't find any other data belonging to the
>> file within 50MBs around the the sample scalpel had found.
>
> What is the original size of this video file?
>
>> Thanks again for the info.
>
> Sure thing.  Hope you get it going.
>
> --
> Stan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-02 21:13         ` Aaron Scheiner
@ 2011-08-03  4:02           ` Stan Hoeppner
  0 siblings, 0 replies; 16+ messages in thread
From: Stan Hoeppner @ 2011-08-03  4:02 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: NeilBrown, linux-raid

On 8/2/2011 4:13 PM, Aaron Scheiner wrote:
> Oh right, I see, my mistake.
> 
> The file is just one of a set of files that I duplicated across two
> arrays. The entire folder (with almost all the duplicated files in it)
> was approximately 2TBs in size. The file I'm using for comparison is
> 11GBs in size.
> The array was originally 8TBs in size, but I upgraded it recently (May
> 2011) to 16TBs (using 2TB drives). As part of the upgrade process I
> copied all the data from the older array to the new array in one large
> cp command.I expect this would have had the effect of defragmenting
> the files... which is great seeing as I'm relying on low fragmentation
> for this process :P .

Ok, so you didn't actually *upgrade* the existing array.  You built a
new array, laid a new filesystem on it, and copied everything over.
Yes, the newly written files would not be fragmented.

> So there's a good chance then that searching on all the drives for
> 512-byte samples from various points in the "example" file will allow
> me to work out the order of the drives.

This seems like a lot of time.  Were you unable to retrieve the
superblocks from the two drives?

> Scalpel is 70% through the first drive. Scans of both the first and
> second drives should be complete by tomorrow morning (my time) yay :)

Disk access is slow compared to CPU.  Why not run 10 instances of
scalpel in parallel, one for each disk, and be done in one big shot?

> Just of interest; machines on a Gigabit LAN used to be able to read
> data off the array at around 60MB/sec... which I was very happy with.
> Since the upgrade to 2TB drives the array has been reading at over
> 100MB/sec, saturating the ethernet interface. Do you think the new
> drives are the reason for the speed increase ? (the new drives are
> cheap Seagate 5900 rpm drives "Green Power", the old drives were
> Samsung 7200 rpm units) or do you think the switch from JFS to XFS
> (and aligning partitions with cylinder boundaries) may have been part
> of it ?

Any answer I could give would be pure speculation, as the arrays in
question are both down and cannot be tested head to head.  The answer
could be as simple as the new SAS/SATA controller(s) being faster than
the old interface(s).  As you've changed so many things, it's probably a
combination of all of them yielding the higher performance.  Or maybe
you changed something in the NFS/CIFS configuration, changed your
kernel, something ancillary to the array that increased network
throughput.  Hard to say at this point with so little information provided.

-- 
Stan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-02  6:39 ` NeilBrown
  2011-08-02  8:01   ` Stan Hoeppner
@ 2011-08-02 16:16   ` Aaron Scheiner
  2011-08-03  5:01     ` NeilBrown
  1 sibling, 1 reply; 16+ messages in thread
From: Aaron Scheiner @ 2011-08-02 16:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil! I mailed this list originally because of your website.

Thanks for the info regarding layout, it's very helpful and explains a lot.

I saved the output of the -E argument before zero-ing the surviving
superblocks :

"
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : e4a266e5:6daac63f:ea0b6986:fb9361cc
           Name : nasrat:0
  Creation Time : Fri May 20 17:32:20 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 3907028896 (1863.02 GiB 2000.40 GB)
     Array Size : 31256213504 (14904.12 GiB 16003.18 GB)
  Used Dev Size : 3907026688 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : f971c81d:da719d55:b55129f1:802d76ea

Internal Bitmap : 2 sectors from superblock
    Update Time : Wed Jul 20 12:40:39 2011
       Checksum : 7ff625b6 - correct
         Events : 421070

     Chunk Size : 128K

"
and the second surviving superblock :

"/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : e4a266e5:6daac63f:ea0b6986:fb9361cc
           Name : nasrat:0
  Creation Time : Fri May 20 17:32:20 2011
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 3907026688 (1863.02 GiB 2000.40 GB)
     Array Size : 31256213504 (14904.12 GiB 16003.18 GB)
    Data Offset : 256 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : ba9da7a2:c47b749f:57081441:5b5e8a4e

Internal Bitmap : 2 sectors from superblock
    Update Time : Wed Jul 20 12:40:39 2011
       Checksum : 2aa3a36 - correct
         Events : 421070

     Chunk Size : 128K

"

This is what I've recently done :

I wrote an application that takes samples of the file I have (a copy
of which was also on the damaged array) and writes those samples into
a config file for scalpel.
The application collected 40 samples, each sample is 512 bytes long.
There are gaps of 30KBytes between samples (so the samples cover
roughly 1.2MBytes in total).

Scalpel is currently searching through the drives for the samples. I'm
hoping that once scalpel is done all I'll have to do is arrange the
drives based on what drives samples are discovered on.

Assuming the file was stored contiguously in the filesystem, do you
think this would work ?

Thanks for the info and thanks for an amazing utility/application.

Aaron

On Tue, Aug 2, 2011 at 8:39 AM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 27 Jul 2011 14:16:52 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:
>
>> Hi
>>
>> My original message was sent (and rejected) 4 days ago because it was
>> in HTML (whoops). Here's my original e-mail with updates :
>>
>> I've got an Ubuntu machine with an mdadm RAID array on level 6. The
>> RAID-6 array consists of 10 drives and has been running for about 3
>> years now. I recently upgraded the drives in the array from 1TB units
>> to 2TB units.
>>
>> The drive on which the OS sat died a few days ago, so I installed a
>> new OS drive and then installed Ubuntu Server on it.
>> On reboot the machine hung on a black screen with a white flashing
>> cursor. So I went back into the Ubuntu Setup and installed grub to all
>> of the drives in the raid array (except two) [wow, this was such a
>> stupid move].
>
> So you still have two with valid superblocks?  (or did before you started
> re-creating).  Do you have a copy of the "mdadm --examine" of those?  It
> would be helpful.
>
>>
>> I then rebooted the machine and it successfully booted into Ubuntu
>> Server. I set about restoring the configuration for the raid array...
>> only to be given the message "No Superblock Found" (more or less).
>> Each element in the array was used directly by mdadm (so /dev/sda, not
>> /dev/sda1).
>>
>> I see that the superblock is stored within the MBR region on the drive
>> (which is 512bytes from the start of the disk), which would explain
>> why the superblocks were destroyed. I haven't been able to find
>> anything regarding a backup superblock (does something like this
>> exist?).
>>
>> I have now started using a script that tries to re-create the array by
>> running through the various permutations available... it takes roughly
>> 2.5 seconds per permutation/iteration and there are just over 40 000
>> possibilities. The script tests for a valid array by trying to mount
>> the array as read only (it's an XFS file system). I somehow doubt that
>> it will mount even when the correct combination of disks is found.
>> [UPDATE] : It never mounted.
>
> Odd...  Possibly you have a newer mdadm which uses a different "Data
> offset".  The "mdadm --examine" of the 2 drives that didn't get corrupted
> would help confirm that.
>
>>
>> So... I have an idea... The array has a hole bunch of movie files on
>> it and I have exact copies of some of them on another raid array. So I
>> was thinking that if I searched for the start of one of those files on
>> the scrambled array, I could work out the order of the disks by
>> searching forward until I found a change. I could then compare the
>> changed area (probably 128KB/the chunk size forward) with the file I
>> have and see where that chunk lies in the file, thereby working out
>> the order.
>> [UPDATE] : Seeing as the array never mounted, I have proceeded with
>> this idea. I took samples of the start of the video file and provided
>> them to Scalpel as needles for recovery. After two days of searching,
>> Scalpel located the starts of the various segments in the raid array
>> (I re-created the raid array with the drives in random order). I then
>> carved (using dd) 3MBs out of the raid array that contains all the
>> samples handed to scalpel originally (plus a bit more).
>>
>> So now I have to find segments of the start of the intact file in the
>> carved out data from the raid array.
>>
>> It would be really useful if I knew the layout of the array :
>>
>> If the chunk size of the array is 128KB, does that mean that the file
>> I carved will be divided up into segments of contiguous data, each
>> 128KBs in length ? or does it mean that the length of contiguous data
>> will be 16KB ( 128 KB / 8 drives ) ?
>
> 128KB per device, so the first alternative.
>
>>
>> Do these segments follow on from each other without interruption or is
>> there some other data in-between (like metadata? I'm not sure where
>> that resides).
>
> That depends on how XFS lays out the data.  It will probably be mostly
> contiguous, but no guarantees.
>
>
>>
>> Any explanation of the structure of a raid6 array would be greatly
>> appreciated, as well as any other advice (tools, tips, etc).
>
> The stripes start "Data Offset" from the beginning of the device, which I
> think is 1MB with recent mdadm, but something like 64K with earlier mdadm.
>
> The first few stripes are:
>
>  Q  0  1  2  3  4  5  6  7  P
>  8  9 10 11 12 13 14 15  P  Q
> 17 18 19 20 21 22 23  P  Q 16
>
> Where 'P' is xor parity, Q is GF parity, and N is chunk number N.  Each chunk
> is 128KB (if that is your chunk size).
>
> This pattern repeats after 10 stripes.
>
> good luck,
> NeilBrown
>
>
>>
>> Thanks :)
>>
>> Aaron (24)
>> South Africa
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-02 16:16   ` Aaron Scheiner
@ 2011-08-03  5:01     ` NeilBrown
  2011-08-03  8:59       ` Aaron Scheiner
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-08-03  5:01 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: linux-raid

On Tue, 2 Aug 2011 18:16:56 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:

> Hi Neil! I mailed this list originally because of your website.
> 
> Thanks for the info regarding layout, it's very helpful and explains a lot.
> 
> I saved the output of the -E argument before zero-ing the surviving
> superblocks :

Good !

> 
> "
> /dev/sdd:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : e4a266e5:6daac63f:ea0b6986:fb9361cc
>            Name : nasrat:0
>   Creation Time : Fri May 20 17:32:20 2011
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 3907028896 (1863.02 GiB 2000.40 GB)
>      Array Size : 31256213504 (14904.12 GiB 16003.18 GB)
>   Used Dev Size : 3907026688 (1863.02 GiB 2000.40 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors

This md superblock is 4K from the start of the device.  So installing GRUB
must have destroyed more than the first 1 or 2 sectors... I hope it didn't
get up to 272 sectors...

> "
> and the second surviving superblock :
> 
> "/dev/sde1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : e4a266e5:6daac63f:ea0b6986:fb9361cc
>            Name : nasrat:0
>   Creation Time : Fri May 20 17:32:20 2011
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 3907026688 (1863.02 GiB 2000.40 GB)
>      Array Size : 31256213504 (14904.12 GiB 16003.18 GB)
>     Data Offset : 256 sectors
>    Super Offset : 8 sectors

And this one has a slightly different data offset - 256 sectors!

So just "creating" an array on top of this set of devices will never restore
your data as the data offsets are not all the same.

This will be a little bit tricky ... but maybe not too tricky.

First we need to find the correct order and data offsets.

I would get the raid6check.c code from mdadm-3.2.1.  It has changed in 3.2.2
in a way that isn't really what you want.

It can be given (among other things) a list of device and it will check if
the RAID6 parity blocks are consistent.
As RAID6 depends on order, this will be a good indication whether the devices
are in the correct order or not.

Read the code so we what args it needs.   Note in particular that each device
can be accompanied by an 'offset' in sectors.  So you might list
   /dev/sde1:256 /dev/sdd:272

I just noticed:  one is a partition, the other is a whole device.  That is a
little unusual...  be careful not to let it confuse you...

So using raid6check, check the parity of the first few stripes for every
ordering of devices, and each choice of "256" or "272" as the sector offset.
Hopefully only one will work.

The "--examine" output you provided doesn't have a "Device Role" or "Array
Slot" line.  That is very strange!!  It should tell you at least which
position in the array that device fills.

It just occurred to me that partitions might have been changed on you some
how when you installed GRUB.... probably not but it would be worth checking
the first few hundred K for md metadata just in case it is there.
A metadata block will start  FC 4E 2B A9  (the 'Magic' above is
little-endian), and will container the "Name" of the array 'nasrat:0' at
offset 32.

Once we find the data we will need to convince mdadm to create an array with
all the different data_offsets.  That will require a little bit of coding but
is achievable.

NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-03  5:01     ` NeilBrown
@ 2011-08-03  8:59       ` Aaron Scheiner
  2011-08-03  9:20         ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Scheiner @ 2011-08-03  8:59 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

mmm, learning experience :P

The Wikipedia page on grub says :
"
Stage 1 can load Stage 2 directly, but it's normally set up to load
Stage 1.5. GRUB Stage 1.5 is located in the first 30 kilobytes of hard
disk immediately following the MBR and before the first partition.
"
So if both stage 1 and stage 1.5 were written to the drives
approximately 30KBytes would have been overwritten, so 60 sectors?

Why is the data offset by 256 bytes ? The array was created using a
standard create command (only specifying raid level, devices, chunk
size).

"I just noticed:  one is a partition, the other is a whole device.  That is a
little unusual...  be careful not to let it confuse you..."
...yes... I'm not sure how that happened heh. I do have the dmesg log
from the OS drive of the machine the array last ran on. It lists stuff
like drive order (the cables have moved around since then) and
devices.

I had to rush out this morning so I haven't been able to check the
status of scalpel... and that's a great idea @ run in parallel.

On Wed, Aug 3, 2011 at 7:01 AM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 2 Aug 2011 18:16:56 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:
>
>> Hi Neil! I mailed this list originally because of your website.
>>
>> Thanks for the info regarding layout, it's very helpful and explains a lot.
>>
>> I saved the output of the -E argument before zero-ing the surviving
>> superblocks :
>
> Good !
>
>>
>> "
>> /dev/sdd:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x1
>>      Array UUID : e4a266e5:6daac63f:ea0b6986:fb9361cc
>>            Name : nasrat:0
>>   Creation Time : Fri May 20 17:32:20 2011
>>      Raid Level : raid6
>>    Raid Devices : 10
>>
>>  Avail Dev Size : 3907028896 (1863.02 GiB 2000.40 GB)
>>      Array Size : 31256213504 (14904.12 GiB 16003.18 GB)
>>   Used Dev Size : 3907026688 (1863.02 GiB 2000.40 GB)
>>     Data Offset : 272 sectors
>>    Super Offset : 8 sectors
>
> This md superblock is 4K from the start of the device.  So installing GRUB
> must have destroyed more than the first 1 or 2 sectors... I hope it didn't
> get up to 272 sectors...
>
>> "
>> and the second surviving superblock :
>>
>> "/dev/sde1:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x1
>>      Array UUID : e4a266e5:6daac63f:ea0b6986:fb9361cc
>>            Name : nasrat:0
>>   Creation Time : Fri May 20 17:32:20 2011
>>      Raid Level : raid6
>>    Raid Devices : 10
>>
>>  Avail Dev Size : 3907026688 (1863.02 GiB 2000.40 GB)
>>      Array Size : 31256213504 (14904.12 GiB 16003.18 GB)
>>     Data Offset : 256 sectors
>>    Super Offset : 8 sectors
>
> And this one has a slightly different data offset - 256 sectors!
>
> So just "creating" an array on top of this set of devices will never restore
> your data as the data offsets are not all the same.
>
> This will be a little bit tricky ... but maybe not too tricky.
>
> First we need to find the correct order and data offsets.
>
> I would get the raid6check.c code from mdadm-3.2.1.  It has changed in 3.2.2
> in a way that isn't really what you want.
>
> It can be given (among other things) a list of device and it will check if
> the RAID6 parity blocks are consistent.
> As RAID6 depends on order, this will be a good indication whether the devices
> are in the correct order or not.
>
> Read the code so we what args it needs.   Note in particular that each device
> can be accompanied by an 'offset' in sectors.  So you might list
>   /dev/sde1:256 /dev/sdd:272
>
> I just noticed:  one is a partition, the other is a whole device.  That is a
> little unusual...  be careful not to let it confuse you...
>
> So using raid6check, check the parity of the first few stripes for every
> ordering of devices, and each choice of "256" or "272" as the sector offset.
> Hopefully only one will work.
>
> The "--examine" output you provided doesn't have a "Device Role" or "Array
> Slot" line.  That is very strange!!  It should tell you at least which
> position in the array that device fills.
>
>
> It just occurred to me that partitions might have been changed on you some
> how when you installed GRUB.... probably not but it would be worth checking
> the first few hundred K for md metadata just in case it is there.
> A metadata block will start  FC 4E 2B A9  (the 'Magic' above is
> little-endian), and will container the "Name" of the array 'nasrat:0' at
> offset 32.
>
> Once we find the data we will need to convince mdadm to create an array with
> all the different data_offsets.  That will require a little bit of coding but
> is achievable.
>
>
> NeilBrown
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-03  8:59       ` Aaron Scheiner
@ 2011-08-03  9:20         ` NeilBrown
  2011-08-05 10:04           ` Aaron Scheiner
  0 siblings, 1 reply; 16+ messages in thread
From: NeilBrown @ 2011-08-03  9:20 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: linux-raid

On Wed, 3 Aug 2011 10:59:22 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:

> mmm, learning experience :P
> 
> The Wikipedia page on grub says :
> "
> Stage 1 can load Stage 2 directly, but it's normally set up to load
> Stage 1.5. GRUB Stage 1.5 is located in the first 30 kilobytes of hard
> disk immediately following the MBR and before the first partition.
> "
> So if both stage 1 and stage 1.5 were written to the drives
> approximately 30KBytes would have been overwritten, so 60 sectors?
> 
> Why is the data offset by 256 bytes ? The array was created using a
> standard create command (only specifying raid level, devices, chunk
> size).


The offset is 256 sectors (64K).

The data obviously cannot go at the start of the drive as the metadata is
there, so it is offset from the start.
I leave 64K to allow for various different metadata (bitmap, bad-block log,
other ideas I might have one day).

NeilBrown


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-03  9:20         ` NeilBrown
@ 2011-08-05 10:04           ` Aaron Scheiner
  2011-08-05 10:32             ` Stan Hoeppner
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Scheiner @ 2011-08-05 10:04 UTC (permalink / raw)
  To: NeilBrown, Stan Hoeppner; +Cc: linux-raid

I followed your advice and ran a scalpel instance for every drive in
the array. The scanning process finished this morning yay (and
obviously went a lot faster).

Firstly, here are some snippets from the dmesg log from the OS drive
that last successfully assembled the array (before the data cables and
power supply cables were scrambled due to a power supply being
upgraded) :

"
[    5.145251] md: bind<sde>
[    5.251753] md: bind<sdi>
[    5.345804] md: bind<sdh>
[    5.389398] md: bind<sdd>
[    5.549170] md: bind<sdf>
[    5.591170] md: bind<sdg1>
[    5.749707] md: bind<sdk>
[    5.952920] md: bind<sda>
[    6.153179] md: bind<sdj>
[    6.381157] md: bind<sdb>
[    6.388871] md/raid:md0: device sdb operational as raid disk 1
[    6.394742] md/raid:md0: device sdj operational as raid disk 6
[    6.400582] md/raid:md0: device sda operational as raid disk 8
[    6.406450] md/raid:md0: device sdk operational as raid disk 3
[    6.412290] md/raid:md0: device sdg1 operational as raid disk 7
[    6.418245] md/raid:md0: device sdf operational as raid disk 0
[    6.424097] md/raid:md0: device sdd operational as raid disk 2
[    6.429939] md/raid:md0: device sdh operational as raid disk 9
[    6.435807] md/raid:md0: device sdi operational as raid disk 4
[    6.441679] md/raid:md0: device sde operational as raid disk 5
[    6.448311] md/raid:md0: allocated 10594kB
[    6.452515] md/raid:md0: raid level 6 active with 10 out of 10
devices, algorithm 2
[    6.460218] RAID conf printout:
[    6.460219]  --- level:6 rd:10 wd:10
[    6.460221]  disk 0, o:1, dev:sdf
[    6.460223]  disk 1, o:1, dev:sdb
[    6.460224]  disk 2, o:1, dev:sdd
[    6.460226]  disk 3, o:1, dev:sdk
[    6.460227]  disk 4, o:1, dev:sdi
[    6.460229]  disk 5, o:1, dev:sde
[    6.460230]  disk 6, o:1, dev:sdj
[    6.460232]  disk 7, o:1, dev:sdg1
[    6.460234]  disk 8, o:1, dev:sda
[    6.460235]  disk 9, o:1, dev:sdh
[    6.460899] md0: bitmap initialized from disk: read 1/1 pages, set 3 bits
[    6.467707] created bitmap (15 pages) for device md0
[    6.520308] md0: detected capacity change from 0 to 16003181314048
[    6.527596]  md0: p1
"

I've taken the output from the files Scalpel generated and I've
assembled them on a spreadsheet. It's a Google Docs spreadsheet and it
can be viewed here :
https://spreadsheets.google.com/spreadsheet/ccc?key=0AtfzQK17PqSTdGRYbUw3eGtCM3pKVl82TzJWYWlpS3c&hl=en_GB

Samples from the the first 1.3MBs of the test file were found in the
following order :
sda
sdk
sdf
sdh
sdc
sdb
sdj
sdi
sdd

So now the next step would have been to re-create the array and check
if a file system check finds something... but because of the offsets
that probably won't work ?

Thanks again :)

On Wed, Aug 3, 2011 at 11:20 AM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 3 Aug 2011 10:59:22 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:
>
>> mmm, learning experience :P
>>
>> The Wikipedia page on grub says :
>> "
>> Stage 1 can load Stage 2 directly, but it's normally set up to load
>> Stage 1.5. GRUB Stage 1.5 is located in the first 30 kilobytes of hard
>> disk immediately following the MBR and before the first partition.
>> "
>> So if both stage 1 and stage 1.5 were written to the drives
>> approximately 30KBytes would have been overwritten, so 60 sectors?
>>
>> Why is the data offset by 256 bytes ? The array was created using a
>> standard create command (only specifying raid level, devices, chunk
>> size).
>
>
> The offset is 256 sectors (64K).
>
> The data obviously cannot go at the start of the drive as the metadata is
> there, so it is offset from the start.
> I leave 64K to allow for various different metadata (bitmap, bad-block log,
> other ideas I might have one day).
>
> NeilBrown
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-05 10:04           ` Aaron Scheiner
@ 2011-08-05 10:32             ` Stan Hoeppner
  2011-08-05 11:28               ` Aaron Scheiner
  0 siblings, 1 reply; 16+ messages in thread
From: Stan Hoeppner @ 2011-08-05 10:32 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: NeilBrown, linux-raid

On 8/5/2011 5:04 AM, Aaron Scheiner wrote:
> I followed your advice and ran a scalpel instance for every drive in
> the array. The scanning process finished this morning yay (and
> obviously went a lot faster).

Glad I could help.

...
> So now the next step would have been to re-create the array and check

Unless Neil has some hex editor (or other similar) trick up his sleeve
that would allow you to manually recreate the sectors you hosed
installing grub..

> if a file system check finds something... but because of the offsets
> that probably won't work ?

If you are able to use the Force to assemble the disks in the correct
order, getting the raid device back up and running, then run 'xfs_repair
-n /dev/md0' to do the check.  The '-n' means "no modify".  xfs_repair
is better than xfs_check in many aspects.  They are two separate code
paths that serve the same function, but they behave a little differently
under the hood.

> Thanks again :)

Wish I could have helped you recover the array.  When a patient comes
through emergency with 3 GSWs to the forehead and no pulse, there's
nothing that can be done.  :(

-- 
Stan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-05 10:32             ` Stan Hoeppner
@ 2011-08-05 11:28               ` Aaron Scheiner
  2011-08-05 12:16                 ` NeilBrown
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Scheiner @ 2011-08-05 11:28 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: NeilBrown, linux-raid

"Wish I could have helped you recover the array.  When a patient comes
through emergency with 3 GSWs to the forehead and no pulse, there's
nothing that can be done.  :("

Quite an analogy :P . Is it really that hopeless ? And of interest
have you been in that situation ? I have the order of the array... so
it should just be a case of reconstructing it in the same way it was
created. Only one device was partitioned (as can be seen from the
dmesg logs) and that device is known.

Most of the array was backed up to another array, but there are some
photos I'd like to recover. The array doesn't need to be functional
any time soon (I'm happy to keep working on it for another 3 months if
necessary).

Besides, it's a good learning experience.

I "re-created" the array using the drive order I specified in the
previous e-mail, but replacing drive 9 with "missing". I then ran
xfs_check on the md device and it failed (as it did with every other
re-create attempt). I'll give your suggestion of xfs_repair -n a go :)
.

I'll have a look at the start of the md device with a hex editor and
see if there's anything interesting there.

On Fri, Aug 5, 2011 at 12:32 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 8/5/2011 5:04 AM, Aaron Scheiner wrote:
>> I followed your advice and ran a scalpel instance for every drive in
>> the array. The scanning process finished this morning yay (and
>> obviously went a lot faster).
>
> Glad I could help.
>
> ...
>> So now the next step would have been to re-create the array and check
>
> Unless Neil has some hex editor (or other similar) trick up his sleeve
> that would allow you to manually recreate the sectors you hosed
> installing grub..
>
>> if a file system check finds something... but because of the offsets
>> that probably won't work ?
>
> If you are able to use the Force to assemble the disks in the correct
> order, getting the raid device back up and running, then run 'xfs_repair
> -n /dev/md0' to do the check.  The '-n' means "no modify".  xfs_repair
> is better than xfs_check in many aspects.  They are two separate code
> paths that serve the same function, but they behave a little differently
> under the hood.
>
>> Thanks again :)
>
> Wish I could have helped you recover the array.  When a patient comes
> through emergency with 3 GSWs to the forehead and no pulse, there's
> nothing that can be done.  :(
>
> --
> Stan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-08-05 11:28               ` Aaron Scheiner
@ 2011-08-05 12:16                 ` NeilBrown
  0 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2011-08-05 12:16 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: Stan Hoeppner, linux-raid

On Fri, 5 Aug 2011 13:28:41 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:

> "Wish I could have helped you recover the array.  When a patient comes
> through emergency with 3 GSWs to the forehead and no pulse, there's
> nothing that can be done.  :("
> 
> Quite an analogy :P . Is it really that hopeless ? And of interest
> have you been in that situation ? I have the order of the array... so
> it should just be a case of reconstructing it in the same way it was
> created. Only one device was partitioned (as can be seen from the
> dmesg logs) and that device is known.
> 
> Most of the array was backed up to another array, but there are some
> photos I'd like to recover. The array doesn't need to be functional
> any time soon (I'm happy to keep working on it for another 3 months if
> necessary).
> 
> Besides, it's a good learning experience.
> 
> I "re-created" the array using the drive order I specified in the
> previous e-mail, but replacing drive 9 with "missing". I then ran
> xfs_check on the md device and it failed (as it did with every other
> re-create attempt). I'll give your suggestion of xfs_repair -n a go :)

When you create the array, what 'data offset' does mdadm choose?
If not the same as the original the filesystem will of course not look right.

You might need to hack mdadm (super1.c) to set a different data_offset

NeilBrown



> .
> 
> I'll have a look at the start of the md device with a hex editor and
> see if there's anything interesting there.
> 
> On Fri, Aug 5, 2011 at 12:32 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> > On 8/5/2011 5:04 AM, Aaron Scheiner wrote:
> >> I followed your advice and ran a scalpel instance for every drive in
> >> the array. The scanning process finished this morning yay (and
> >> obviously went a lot faster).
> >
> > Glad I could help.
> >
> > ...
> >> So now the next step would have been to re-create the array and check
> >
> > Unless Neil has some hex editor (or other similar) trick up his sleeve
> > that would allow you to manually recreate the sectors you hosed
> > installing grub..
> >
> >> if a file system check finds something... but because of the offsets
> >> that probably won't work ?
> >
> > If you are able to use the Force to assemble the disks in the correct
> > order, getting the raid device back up and running, then run 'xfs_repair
> > -n /dev/md0' to do the check.  The '-n' means "no modify".  xfs_repair
> > is better than xfs_check in many aspects.  They are two separate code
> > paths that serve the same function, but they behave a little differently
> > under the hood.
> >
> >> Thanks again :)
> >
> > Wish I could have helped you recover the array.  When a patient comes
> > through emergency with 3 GSWs to the forehead and no pulse, there's
> > nothing that can be done.  :(
> >
> > --
> > Stan
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Grub-install, superblock corrupted/erased and other animals
  2011-07-27 12:16 Grub-install, superblock corrupted/erased and other animals Aaron Scheiner
  2011-08-02  6:39 ` NeilBrown
@ 2011-08-03  7:13 ` Stan Hoeppner
  1 sibling, 0 replies; 16+ messages in thread
From: Stan Hoeppner @ 2011-08-03  7:13 UTC (permalink / raw)
  To: Aaron Scheiner; +Cc: linux-raid

On 7/27/2011 7:16 AM, Aaron Scheiner wrote:

> The drive on which the OS sat died a few days ago, so I installed a
> new OS drive and then installed Ubuntu Server on it.
> On reboot the machine hung on a black screen with a white flashing
> cursor. So I went back into the Ubuntu Setup and installed grub to all
> of the drives in the raid array (except two) [wow, this was such a
> stupid move].
> 
> I then rebooted the machine and it successfully booted into Ubuntu
> Server. I set about restoring the configuration for the raid array...
> only to be given the message "No Superblock Found" (more or less).
> Each element in the array was used directly by mdadm (so /dev/sda, not
> /dev/sda1).
...

I should have mentioned these thoughts previously, but often folks on
this list get tired of hearing it over and over, similar to Mom saying
"clean your room" or road signs every few miles telling you to "buckle
up".  People get tired of it, but there's a good reason for it: to
prevent future heartache.

Now that you're seen first hand how fragile Linux RAID can be in some
circumstances in the hands of less than expert users, you may want to
consider taking measures to prevent the consequences of this kind of
thing in the future:

1.  Make regular backups of your data and configuration files
    -use flash pen drive for the config files
    -use whatever you can afford for data files
    -if you can't afford to backup 16TB, use DVD-R for important files
     and sacrifice the rest
2.  Use kernel supported hardware RAID everywhere if you can afford it
3.  At minimum use a bootable hardware RAID card with RAID1 drives

In this case #3 would have saved your bacon.  One possible Linux
compatible solution would be the following:

Qty 1:  http://www.newegg.com/Product/Product.aspx?Item=N82E16816116075
Qty 2:  http://www.newegg.com/Product/Product.aspx?Item=N82E16820167062
 and
Qty 1:  http://www.newegg.com/Product/Product.aspx?Item=N82E16817996013
 or
Qty 1:  http://www.newegg.com/Product/Product.aspx?Item=N82E16817986003
to house the SSDs, depending on what free bays you have.

Mirror the two Intel Enterprise SSDs with the controller and install
Ubuntu on the RAID device.  The mainline kernel supports the 3Ware 9650
series so it should simply work.  You'll only have 20GB but that should
be more than plenty for a boot/root filesystem drive.  This is the least
expensive hardware RAID1 boot setup I could piece together but at the
same time the fastest.  This will run you ~$435-$460 USD.  I have no
idea of cost or availability of these in the UK.  For about $20 USD more
per drive you could use the 500GB Seagate Constellation Enterprise 2.5"
7.2k drives:
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148711

If you must substitute, simply make sure you get an "enterprise" type
drive/SSD as they have TLER support and other firmware features that
allow them to play nice with real RAID cards.

-- 
Stan

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-08-05 12:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-27 12:16 Grub-install, superblock corrupted/erased and other animals Aaron Scheiner
2011-08-02  6:39 ` NeilBrown
2011-08-02  8:01   ` Stan Hoeppner
2011-08-02 16:24     ` Aaron Scheiner
2011-08-02 16:41       ` Stan Hoeppner
2011-08-02 21:13         ` Aaron Scheiner
2011-08-03  4:02           ` Stan Hoeppner
2011-08-02 16:16   ` Aaron Scheiner
2011-08-03  5:01     ` NeilBrown
2011-08-03  8:59       ` Aaron Scheiner
2011-08-03  9:20         ` NeilBrown
2011-08-05 10:04           ` Aaron Scheiner
2011-08-05 10:32             ` Stan Hoeppner
2011-08-05 11:28               ` Aaron Scheiner
2011-08-05 12:16                 ` NeilBrown
2011-08-03  7:13 ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).