linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Alex Tomas <bzzz@sun.com>, ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [RFC] dynamic inodes
Date: Fri, 26 Sep 2008 04:33:22 -0600	[thread overview]
Message-ID: <20080926103322.GA10950@webber.adilger.int> (raw)
In-Reply-To: <20080926021132.GA11413@mit.edu>

On Sep 25, 2008  22:11 -0400, Theodore Ts'o wrote:
> On Thu, Sep 25, 2008 at 04:37:31PM -0600, Andreas Dilger wrote:
> > If one adds a new group (ostensibly "at the end of the filesystem") that
> > has a flag which indicates there are no blocks available in the group,
> > then what we get is the inode bitmap and inode table, with a 1-block
> > "excess baggage" of the block bitmap and a new group descriptor.  The
> > "baggage" is small considering any overhead needed to locate and describe
> > fully dynamic inode tables.
>
> It's a good idea; and technically you don't have to allocate a block
> bitmap, given that the flag is present which says "no blocks
> available".  The reason for allocating it is if you're trying to
> maintain full backwards compatibility, it will work --- except that
> you need some way of making sure that the on-line resizing code won't
> screw with the filesystem --- so the feature would have to be a
> read/only compat feature anyway.

Sure, I agree it is possible to go either way.  I was just trying to
go for the element of least surprise.  Having a group with
"bg_block_bitmap = 0" would be strange, but no more strange than having
a group for blocks beyond the end of the filesystem...

> To do on-line resizing, you'd have to clear the flag and then know to
> that the first "inode-only" block group should be given the new
> blocks.

Right.

> > The itable location would be replicated to all of the group descriptor
> > backups for safety, though we would need to find a way for "META_BG"
> > to store a backup of the GDT in blocks that don't exist, in the case
> > where increasing the GDT size in-place isn't possible.
>
> This is actually the big problem; with META_BG, in order to find the
> group descriptor blocks, it assumes that the first group descriptor
> can be found at the beginning of the group descriptor block, which
> means it has to be found at a certain offset from the beginning of the
> filesystem.  And this would not be true for inode-only block groups.

We could special-case the placement of the GDT blocks in this case, and
then put them into the proper META_BG location when/if the blocks are
actually added to the filesystem.

> The simplest solution actually would be to to allocate inodes from the
> *end* of the 32-bit inode space, growing downwards, and having those
> inodes be stored in a reserved inode.  You would lose block locality,
> although that could be solved by adding a block group affinity field
> in the inode structure which is used by "extended inodes".

I don't see how growing the inode numbers downward really helps anything.
With FLEX_BG there already is no "affinity" between the inodes and the
blocks.  The drawback of putting the inode table into an inode is that
this is relatively fragile if the inode is corrupted.  We'd want to have
replication of the inode itself (we couldn't replicate the whole inode
table very efficiently).

Alternately, we could put the GDT into the inode and replicate the whole
inode several times (the data would already be present in the filesystem).
We just need to select inodes from disparate parts of the filesystem to
avoid corruption (I'd suggest one inode from each backup superblock
group), point them at the existing GDT blocks, then allow the new GDT
blocks to be added to each one.  The backup GDT-inode copies only need
to be changed when new groups are added/removed.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2008-09-26 10:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-24 11:46 [RFC] dynamic inodes Alex Tomas
2008-09-25 22:09 ` Andreas Dilger
2008-09-25 23:00   ` Alex Tomas
2008-09-25 23:29     ` Andreas Dilger
2008-09-30 14:02       ` Alex Tomas
2008-09-25 22:37 ` Andreas Dilger
2008-09-26  1:10   ` Jose R. Santos
2008-09-26 10:36     ` Andreas Dilger
2008-09-26 14:49       ` Jose R. Santos
2008-09-26 20:01         ` Andreas Dilger
2008-09-26  2:11   ` Theodore Tso
2008-09-26 10:33     ` Andreas Dilger [this message]
2008-09-26 14:33       ` Theodore Tso
2008-09-26 20:18         ` Andreas Dilger
2008-09-26 22:26           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080926103322.GA10950@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=bzzz@sun.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).