linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Scheiner <blue@aquarat.za.net>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: Grub-install, superblock corrupted/erased and other animals
Date: Tue, 2 Aug 2011 18:24:45 +0200	[thread overview]
Message-ID: <CADz4AWFJHah=OkZ0JaC8ZZD+E13D6tOvhz7cRR0oc19OfsOQew@mail.gmail.com> (raw)
In-Reply-To: <4E37AEE8.1080108@hardwarefreak.com>

wow... I had no idea XFS was that complex, great for performance,
horrible for file recovery :P . Thanks for the explanation.

Based on this the scalpel+lots of samples approach might not work...
I'll investigate XFS a little more closely, I just assumed it would
write big files in one continuous block.

This makes a lot of sense; I reconstructed/re-created the array using
a random drive order, scalpel'ed the md device for the start of the
video file and found it. I then dd'ed that out to a file on the hard
drive and then loaded that into a hex editor. The file ended abruptly
after about +/-384KBs. I couldn't find any other data belonging to the
file within 50MBs around the the sample scalpel had found.

Thanks again for the info.

On Tue, Aug 2, 2011 at 10:01 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 8/2/2011 1:39 AM, NeilBrown wrote:
>> On Wed, 27 Jul 2011 14:16:52 +0200 Aaron Scheiner <blue@aquarat.za.net> wrote:
>
>>> Do these segments follow on from each other without interruption or is
>>> there some other data in-between (like metadata? I'm not sure where
>>> that resides).
>>
>> That depends on how XFS lays out the data.  It will probably be mostly
>> contiguous, but no guarantees.
>
> Looks like he's still under the 16TB limit (8*2TB drives) so this is an
> 'inode32' XFS filesystem.  inode32 and inoe64 have very different
> allocation behavior.  I'll take a stab at an answer, and though the
> following is not "short" by any means, it's not nearly long enough to
> fully explain how XFS lays out data on disk.
>
> With inode32, all inodes (metadata) are stored in the first allocation
> group, maximum 1TB, with file extents in the remaining AGs.  When the
> original array was created (and this depends a bit on how old his
> kernel/xfs module/xfsprogs are) mkfs.xfs would have queried mdraid for
> the existence of a stripe layout.  If found, mkfs.xfs would have created
> 16 allocation groups of 500GB each, the first 500GB AG being reserved
> for inodes.  inode32 writes all inodes to the first AG and distributes
> files fairly evenly across top level directories in the remaining 15 AGs.
>
> This allocation parallelism is driven by directory count.  The more top
> level directories the greater the filesystem write parallelism.  inode64
> is much better as inodes are spread across all AGs instead of being
> limited to the first AG, giving metadata heavy workloads a boost (e.g.
> maildir).  inode32 filesystems are limited to 16TB in size, while
> inode64 is limited to 16 exabytes.  inode64 requires a fully 64 bit
> Linux operating system, and though inode64 scales far beyond 16TB, one
> can use inode64 on much smaller filesystems for the added benefits.
>
> This allocation behavior is what allows XFS to have high performance
> with large files as free space management within and across multiple
> allocation groups keeps file fragmentation to a minimum.  Thus, there
> are normally large spans of free space between AGs, on a partially
> populated XFS filesystem.
>
> So, to answer the question, if I understood it correctly, there will
> indeed be data spread all over all of the disks with large free space
> chunks in between.  The pattern of files on disk will not be contiguous.
>  Again, this is by design, and yields superior performance for large
> file workloads, the design goal of XFS.  It doesn't do horribly bad with
> many small file workloads either.
>
> --
> Stan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-08-02 16:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-27 12:16 Grub-install, superblock corrupted/erased and other animals Aaron Scheiner
2011-08-02  6:39 ` NeilBrown
2011-08-02  8:01   ` Stan Hoeppner
2011-08-02 16:24     ` Aaron Scheiner [this message]
2011-08-02 16:41       ` Stan Hoeppner
2011-08-02 21:13         ` Aaron Scheiner
2011-08-03  4:02           ` Stan Hoeppner
2011-08-02 16:16   ` Aaron Scheiner
2011-08-03  5:01     ` NeilBrown
2011-08-03  8:59       ` Aaron Scheiner
2011-08-03  9:20         ` NeilBrown
2011-08-05 10:04           ` Aaron Scheiner
2011-08-05 10:32             ` Stan Hoeppner
2011-08-05 11:28               ` Aaron Scheiner
2011-08-05 12:16                 ` NeilBrown
2011-08-03  7:13 ` Stan Hoeppner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADz4AWFJHah=OkZ0JaC8ZZD+E13D6tOvhz7cRR0oc19OfsOQew@mail.gmail.com' \
    --to=blue@aquarat.za.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).