Re: [RFC] dynamic inodes - Andreas Dilger

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andreas Dilger <adilger@sun.com>
To: "Jose R. Santos" <jrs@us.ibm.com>
Cc: Alex Tomas <bzzz@sun.com>, ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [RFC] dynamic inodes
Date: Fri, 26 Sep 2008 14:01:45 -0600	[thread overview]
Message-ID: <20080926200145.GF10950@webber.adilger.int> (raw)
In-Reply-To: <20080926094903.08e68f5b@gara>

On Sep 26, 2008  09:49 -0500, Jose R. Santos wrote:
> Agreed, but performance wise this way is more consistent with the
> current block and inode allocators.  The block allocator will start its
> free block search on the block group that contains the inode.  Since
> these block groups do not contain any blocks, the block allocator will
> have to be modify to make sure data is not being placed randomly in the
> disk.

This is already the case today when a block group is full.  The block
allocator needs to handle this gracefully.

> The flex_bg inode allocator would also need to be modify since
> it currently depends on a algoright that assumes that block groups
> contain actual blocks.  One of the things that got flex_bg added to
> ext4 in the first place was performance the performance improvements it
> provided.  I would like to keep that advantage if possible.

I don't think the performance advantage was at all related to inode->block
locality (since this is actually worse with FLEX_BG) but rather better
metadata locality (e.g. contiguous bitmaps, itables avoiding seeking
during metadata operations).

> This could also be use to speed mkfs since we would not need to zero
> out as many inode tables.  We could initialize just a couple of inode
> tables per flex_bg group and allocate the rest dynamically.

There is already the ability to avoid zeroing ANY inode tables with
uninit_bg, but it is unsafe to do this in production because the old
itable data is there and e2fsck might become confused if the group
bg_itable_unused is lost (due to gdt corruption or other inconsistency).

> You do pay
> a small penalty when allocating a new inode table since we first need
> to find the blocks for that inode table as well as zeroing it afterward.
> The penalty is less than if we do the one time background zeroing of
> inode tables where your disk will be trashing for a while the first
> time it is mounted.

I don't think it is any different.  The itable zeroing is _still_ needed,
because the flag that indicates if an itable is used or not is unreliable
in some corruption cases, and we don't want to read garbage from disk.
IMHO when a filesystem is first formatted and mounted it is probably
mostly idle, and if not the zeroing (and other stuff) thread can be delayed
(e.g. in a new distro install maybe the itables aren't zeroed until the
second or third mount, no great loss/risk).

> If supporting already existing filesystems is really important we could
> always implement both techniques since they technically should not
> conflict with each other, though you couldn't use both of them at the
> same time if you have a 1:1 block/inode ratio.

IMHO dynamic inode tables for existing filesystems is the MAIN goal.
Once you know you have run out of inodes it is already too late to plan
for it, and if you need a reformat to implement this scheme you could
just as easily reformat with enough inodes in the first place :-).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

next prev parent reply	other threads:[~2008-09-26 20:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-24 11:46 [RFC] dynamic inodes Alex Tomas
2008-09-25 22:09 ` Andreas Dilger
2008-09-25 23:00   ` Alex Tomas
2008-09-25 23:29     ` Andreas Dilger
2008-09-30 14:02       ` Alex Tomas
2008-09-25 22:37 ` Andreas Dilger
2008-09-26  1:10   ` Jose R. Santos
2008-09-26 10:36     ` Andreas Dilger
2008-09-26 14:49       ` Jose R. Santos
2008-09-26 20:01         ` Andreas Dilger [this message]
2008-09-26  2:11   ` Theodore Tso
2008-09-26 10:33     ` Andreas Dilger
2008-09-26 14:33       ` Theodore Tso
2008-09-26 20:18         ` Andreas Dilger
2008-09-26 22:26           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080926200145.GF10950@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=bzzz@sun.com \
    --cc=jrs@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.