Re: [RFC] dynamic inodes - Jose R. Santos

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: "Jose R. Santos" <jrs@us.ibm.com>
To: Andreas Dilger <adilger@sun.com>
Cc: Alex Tomas <bzzz@sun.com>, ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [RFC] dynamic inodes
Date: Fri, 26 Sep 2008 09:49:03 -0500	[thread overview]
Message-ID: <20080926094903.08e68f5b@gara> (raw)
In-Reply-To: <20080926103607.GB10950@webber.adilger.int>

On Fri, 26 Sep 2008 04:36:07 -0600
Andreas Dilger <adilger@sun.com> wrote:

> On Sep 25, 2008  20:10 -0500, Jose R. Santos wrote:
> > One way to get around this is to implement the exact opposite of what I
> > proposed earlier and have a block group with no inode tables.  If we do
> > a 1:1 distribution of inode per block and don't allocate inodes tables
> > for a series of block groups within a flexbg we could later on attempt
> > to allocate new inode tables when we run out of inodes.  If we leave
> > holes in the inode numbers for the missing inode tables, adding new
> > inode tables in these block groups would not require any inode
> > renumbering.  This also does not break the current inode allocator
> > which would be a good thing.  This should be even simpler to implement
> > than the previous proposal.  The drawbacks are that when allocating a
> > new inode table, the 1:1 distribution of inode per block would mean
> > that we need to find a bigger chunk on contiguous blocks to since we
> > have bigger inode tables per block group.  Since the current inode
> > allocator tries to keep a 10% of blocks in a flexbg free, finding
> > contiguous blocks may not be a really big issue.  Another issue is 64bit
> > filesystem if we use a 1:1 scheme.
> > 
> > This would be like uninitialized inode tables with the added steps of
> > finding free blocks, allocating a new inode and zeroing the newly
> > created inode table.  Since we could chose to allocate a new inode
> > table on a flexbg with the most free blocks, this could keep filesystem
> > meta-data/data layout consistently close together to maintain
> > predictable performance.  This option also has no overhead compared to
> > the previous proposal.
> 
> The problem with leaving gaps in the itable is that this needs the
> filesystem to be created in this manner in the first place, while adding
> them at the end can be done to any filesystem.  If we are preparing the
> filesystem in advance for this we could just reserve enough GDT space
> too (as online resize already does to some extent)..

Agreed, but performance wise this way is more consistent with the
current block and inode allocators.  The block allocator will start its
free block search on the block group that contains the inode.  Since
these block groups do not contain any blocks, the block allocator will
have to be modify to make sure data is not being placed randomly in the
disk.  The flex_bg inode allocator would also need to be modify since
it currently depends on a algoright that assumes that block groups
contain actual blocks.  One of the things that got flex_bg added to
ext4 in the first place was performance the performance improvements it
provided.  I would like to keep that advantage if possible.

This could also be use to speed mkfs since we would not need to zero
out as many inode tables.  We could initialize just a couple of inode
tables per flex_bg group and allocate the rest dynamically.  You do pay
a small penalty when allocating a new inode table since we first need
to find the blocks for that inode table as well as zeroing it afterward.
The penalty is less than if we do the one time background zeroing of
inode tables where your disk will be trashing for a while the first
time it is mounted.

If supporting already existing filesystems is really important we could
always implement both techniques since they technically should not
conflict with each other, though you couldn't use both of them at the
same time if you have a 1:1 block/inode ratio.

> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 

-JRS

next prev parent reply	other threads:[~2008-09-26 14:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-24 11:46 [RFC] dynamic inodes Alex Tomas
2008-09-25 22:09 ` Andreas Dilger
2008-09-25 23:00   ` Alex Tomas
2008-09-25 23:29     ` Andreas Dilger
2008-09-30 14:02       ` Alex Tomas
2008-09-25 22:37 ` Andreas Dilger
2008-09-26  1:10   ` Jose R. Santos
2008-09-26 10:36     ` Andreas Dilger
2008-09-26 14:49       ` Jose R. Santos [this message]
2008-09-26 20:01         ` Andreas Dilger
2008-09-26  2:11   ` Theodore Tso
2008-09-26 10:33     ` Andreas Dilger
2008-09-26 14:33       ` Theodore Tso
2008-09-26 20:18         ` Andreas Dilger
2008-09-26 22:26           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080926094903.08e68f5b@gara \
    --to=jrs@us.ibm.com \
    --cc=adilger@sun.com \
    --cc=bzzz@sun.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox