From: "Abhishek Rai" <abhishekrai@google.com>
To: "Andreas Dilger" <adilger@sun.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH] Clustering indirect blocks in Ext2
Date: Thu, 25 Oct 2007 17:31:45 -0700 [thread overview]
Message-ID: <d9885f0f0710251731k46525380o288a3fffca9dff@mail.gmail.com> (raw)
In-Reply-To: <20071025233820.GL3042@webber.adilger.int>
On 10/25/07, Andreas Dilger <adilger@sun.com> wrote:
> On Oct 25, 2007 15:56 -0700, Abhishek Rai wrote:
> > While this patch does add some complexity to ext2, it has the benefit
> > of backward and forward compatibility which will probably make it
> > attractive for more people than any change that changes on-disk
> > format.
>
> To be honest, I think the number of people using ext2 on their systems
> is relatively small compared to ext3 because of the e2fsck hit on each
> boot. IMHO, that means the engineering effort spent on improving
> e2fsck for ext2 is less worthwhile than if the same effort was spent
> on testing ext4 and the improvements made there.
>
> My understanding is that the primary reason Google is using ext2 instead
> of ext3 is because of the performance impact of journaling. With the
> performance (and also scalability) improvements in ext4, doesn't it make
> sense to put test/development time and effort toward ext4?
That is true. However, this patch is meant to be a stop gap fix for
ext2 users until ext4 becomes stable enough for them to migrate to it.
However, for reaching a wider user base I'll port it to ext3. On ext3
the benefits won't be seen on every reboot of course, but a full fsck
is nevertheless needed even on ext3 every once in a while. I've seen
this patch reduce e2fsck time by 50-65% without any fsck changes, and
with a minor fsck change we've seen e2fsck time go down by 80%.
> > Thanks for pointing these out. extents and delalloc+mballoc are of
> > course useful but are not a simple transition though I'm definitely
> > considering trying them out.
>
> Note that delalloc and mballoc don't strictly require extents, as
> they are in-memory optimizations only.
>
> > Regarding the uninit_groups patch, I think it can be implemented in a
> > backward compatible way as follows. Instead of modifying the group
> > desc to store the number of unused inodes (bg_itable_inodes), we can
> > alternatively define an implicit boundary in every group's inode
> > bitmap by having a special free "marker" inode with a certain
> > signature. Whenever we need to allocate inodes in a group beyond this
> > boundary, we shift the boundary by using a later inode as the free
> > marker inode. The idea is that new ext2 will try to allocate inodes
> > from before the marker and fsck will not seek past the marker.
>
> The problem with this is that ext2 is not journalled and it is possible
> that updates are not ordered on disk. The danger is that the update
> of the marker block is lost, but inodes are allocated after it.
I don't agree. Since this is meant to be a rare operation, it should
be OK to do a sync write and really wait for it to complete. It may
get stuck in the disk cache though, but that is true with all kinds of
metadata updates on ext2. Fortunately, even if we don't manage to
write out the new marker-inode, fsck will assume that there is none
which is the same as old behavior anyway. However, I do realize that
storing the inode count in the group descriptor can perform better
than marker-inodes since if bg_itable_inodes == 0, we can entirely
skip reading inodes from a block group (not entirely true). But I
think there is a way or two to achieve the same effect with
marker-inodes as well. I'll work on a patch and see how it goes.
> > - Over time markers drift towards higher inode numbers but never
> > travel backwards, so a pathological workload can kill all markers
> > bringing us back to old behavior, but this is very unlikely.
>
> This is currently true of the uninit_groups feature also, because it
> is a lot easier to avoid the problem of safely shrinking the high
> watermark. On the next e2fsck it will shrink the high watermark for
> each group again.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Software Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
Thanks,
Abhishek
prev parent reply other threads:[~2007-10-26 0:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <d9885f0f0710250320u2af6dd3eq730f460c4ba538fd@mail.gmail.com>
2007-10-25 10:21 ` [PATCH] Clustering indirect blocks in Ext2 Abhishek Rai
2007-10-25 20:20 ` Andreas Dilger
2007-10-25 22:56 ` Abhishek Rai
2007-10-25 23:38 ` Andreas Dilger
2007-10-26 0:31 ` Abhishek Rai [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d9885f0f0710251731k46525380o288a3fffca9dff@mail.gmail.com \
--to=abhishekrai@google.com \
--cc=adilger@sun.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox