From: Ted Ts'o <tytso@mit.edu>
To: linux-ext4@vger.kernel.org
Subject: Re: Proposed design for big allocation blocks for ext4
Date: Fri, 25 Feb 2011 18:40:02 -0500 [thread overview]
Message-ID: <20110225234002.GA2924@thunk.org> (raw)
In-Reply-To: <20110225215924.GA28214@noexit>
On Fri, Feb 25, 2011 at 01:59:25PM -0800, Joel Becker wrote:
>
> Why not call it a 'cluster' like the rest of us do? The term
> 'blocksize' is overloaded enough already.
Yes, good point. Allocation cluster makes a lot more sense as a name.
> > 3) mballoc.c will need little or no changes, other than the
> > EXT4_BLOCKS_PER_GROUP()/EXT4_ALLOC_BLOCKS_PER_GROUP() audit discussed
> > in (1).
>
> Be careful in your zeroing. A new allocation block might have
> pages at its front that are not part of the write() or mmap(). You'll
> either need to keep track that they are uninitialized, or you will have
> to zero them in write_begin() (ocfs2 does the latter). We've had quite
> a few tricky bugs in this area, because the standard pagecache code
> handles the pags covered by the write, but the filesystem has to handle
> the new pages outside the write.
We're going to keep track of what blocks are uninitialized or not on a
4k basis. So that part of the ext4 code doesn't change.
That being said, one of my primary design mantras for ext4 is, "we're
not going to optimize for sparse files". They should work for
correctness sake, but if the file system isn't at its most performant
in the case of sparse files, I'm not going to shed any tears.
> It's a huge win for anything needing large files, like database
> files or VM images. mkfs.ocfs2 has a vmimage mode just for this ;-)
> Even with good allocation code and proper extents, a long-lived
> filesystem with 4K clusters just gets fragmented. This leads to later
> files being very discontiguous, which are slow to I/O to. I think this
> is much more important than the simple speed-of-allocation win.
Yes, very true.
> > Directories will also be allocated in chucks of the allocation block
> > size. If this is especially large (such as 1 MiB), and there are a
> > large number of directories, this could be quite expensive.
> > Applications which use multi-level directory schemes to keep
> > directories small to optimize for ext2's very slow large directory
> > performance could be especially vulnerable.
>
> Anecdotal evidence suggests that directories often benefit with
> clusters of 8-16K size, but suffer greatly after 128K for precisely the
> reasons you describe. We usually don't recommend clusters greater than
> 32K for filesystems that aren't expressly for large things.
Yes. I'm going to assume that file systems optimized for large files
are (in general) not going to have lots of directories, and even if
they do, chewing a 1 megabyte for a directory isn't that a big of a
deal of you're talking about a 2-4TB disk.
We could add complexity to do suballocations for directories, but KISS
seems to be a much better idea for now.
- Ted
next prev parent reply other threads:[~2011-02-25 23:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-25 2:56 Proposed design for big allocation blocks for ext4 Theodore Ts'o
2011-02-25 8:21 ` Andreas Dilger
2011-02-25 9:15 ` Rogier Wolff
2011-02-25 10:01 ` Andreas Dilger
2011-02-25 10:39 ` Rogier Wolff
2011-02-25 12:57 ` Theodore Tso
2011-02-25 18:05 ` Amir Goldstein
2011-02-25 19:04 ` Ted Ts'o
2011-02-25 19:39 ` Andreas Dilger
2011-02-25 21:24 ` Amir Goldstein
2011-02-25 21:59 ` Joel Becker
2011-02-25 23:40 ` Ted Ts'o [this message]
2011-02-26 0:03 ` Joel Becker
2011-02-26 0:31 ` Ted Ts'o
2011-02-26 0:33 ` Joel Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110225234002.GA2924@thunk.org \
--to=tytso@mit.edu \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).