From: Phillip Susi <psusi@ubuntu.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Status of META_BG?
Date: Fri, 16 Mar 2012 09:42:19 -0400 [thread overview]
Message-ID: <4F63433B.1020904@ubuntu.com> (raw)
In-Reply-To: <0A38CCE3-2F78-4B0E-9D5E-6C261EA61902@dilger.ca>
On 3/15/2012 5:06 PM, Andreas Dilger wrote:
>> To get an fs that large, you have to enable 64bit support, which also means you can pass the limit of 32k blocks per group.
>
> I'm not sure what you mean here. Sure, there can be more than 32k
> blocks per group, but there is still only a single block bitmap per
> group so having more blocks is dependent on a larger blocksize.
Heh, I'm not sure what you mean here. What does the block bitmap have
to do with anything? I thought the issue was that the size of the block
group descriptor table exceeded the size of a block group, as a result
of there being a huge number of block groups, limited to a size of 128 MB.
>> Doing that should allow for a much more reasonable number of groups ( which is a good thing several reasons ), and would also solve this problem wouldn't it?
>
> Possibly in conjunction with BIGALLOC.
BIGALLOC?
>> So it puts one GD block at the start of every several block groups?
>
> One at the start of the first group, the second group, and the last
> group.
You mean one copy of the whole table? That's not what the current code
in e2fsprogs looks like it does to me. openfs.c has:
> blk64_t ext2fs_descriptor_block_loc2(ext2_filsys fs, blk64_t group_block,
> dgrp_t i)
> {
> int bg;
> int has_super = 0;
> blk64_t ret_blk;
>
> if (!(fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG) ||
> (i < fs->super->s_first_meta_bg))
> return (group_block + i + 1);
>
> bg = EXT2_DESC_PER_BLOCK(fs->super) * i;
> if (ext2fs_bg_has_super(fs, bg))
> has_super = 1;
> ret_blk = ext2fs_group_first_block2(fs, bg) + has_super;
That appears to map the GDT block number to a block group based on how
many group descriptors fit in a block, so there's one GDT block every
several block groups. The subsequent code then checks if it is being
asked for a backup and shifts the result over by one whole block group,
so it looks like there is exactly one backup, whose blocks are each
stored in the block group following the one that holds the corresponding
primary GDT block.
>> Wouldn't that drastically slow down opening/mounting the fs since the disk has to seek to every block group?
>
> Yes, definitely. That wasn't a concern before flex_bg arrived, since
> that seek was needed for every group's block/inode bitmap as well.
But you don't need to scan every bitmap at mount time do you? Aren't
they loaded on demand when the group is first accessed? But you do need
to scan all of the group descriptors at mount time.
> Maybe with bigalloc the number of groups is reduced, and the size
> of the groups is increased, which helps two ways. First, fewer
> groups means fewer GD blocks, and larger groups mean more GD blocks
> can fit into the 0th and 1st groups.
That's what I was talking about. I'm not sure what bigalloc is, but
once you enable 64bit, that gets you the ability to have more than 32768
blocks per group, so you have less groups and more room in them.
> Well, the "mke2fs -S" is only applying a best guess estimate of the
> metadata location using default parameters. If the default parameters
> are not identical (e.g. flex_bg on/off, bigalloc on/off, etc) then
> "mke2fs -S" will only corrupt an already-fatally-corrupted filesystem,
> and you need to start from scratch.
That's true of mke2fs -S, but you could do the same thing, but consult
the existing superblock to determine the parameters. I believe that all
parameters that affect the contents of the GDT can be found in the
superblock. Specifically, block size, blocks per group, flex factor.
Given that information, e2fsck should be able to rebuild the GDT.
next prev parent reply other threads:[~2012-03-16 13:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-15 15:46 Status of META_BG? Phillip Susi
2012-03-15 16:25 ` Andreas Dilger
2012-03-15 17:55 ` Phillip Susi
2012-03-15 21:06 ` Andreas Dilger
2012-03-16 13:42 ` Phillip Susi [this message]
2012-03-18 20:41 ` Ted Ts'o
2012-03-18 23:20 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F63433B.1020904@ubuntu.com \
--to=psusi@ubuntu.com \
--cc=adilger@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox