From: tytso@mit.edu
To: Daniel Taylor <Daniel.Taylor@wdc.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: inconsistent file placement
Date: Tue, 6 Jul 2010 14:55:48 -0400 [thread overview]
Message-ID: <20100706185548.GA26677@thunk.org> (raw)
In-Reply-To: <469D2D911E4BF043BFC8AD32E8E30F5B24AED8@wdscexbe07.sc.wdc.com>
On Mon, Jul 05, 2010 at 06:49:34PM -0700, Daniel Taylor wrote:
> I realize that it is enerally not a good idea to tune
> an operating system, or subsystem, for benchmarking, but
> there's something that I don't understand about ext[234]
> that is badly affecting our product. File placement on
> newly-created file systems is inconsistent. I can't,
> yet, call it a bug, but I really need to understand what
> is happening, and I cannot find, in the source code, the
> source of the randomization (related to "goal"???).
In ext3, it really is random. The randomness you're looking for can
be found in fs/ext3/ialloc.c:find_group_orlov(), when it calls
get_random_bytes(). This is responsible for "spreading" directories
so they are spread across the block groups, to try to prevent
fragmented files. Yes, if all you care about is benchmarks which only
use 10% of the entire file system, and for which the benchmarks don't
adequately simulate file system aging, the algorithms in ext3 will
cause a lot of variability.
Yes, if you use FAT-style algorithms which try to use the first free
inode, and first free block which is available, for the purposes of
competitive benchmarking (especially if the benchmarks are crap), you
can probably win against the competition. Unfortunately, long-term
your product will probably far more likely to suffer from file system
aging as the blocks at the beginning of the file system are badly
fragmented. Please don't do that, though (or, if you must, please
have a switch so that users can switch it from "competitive
benchmarking mode" to "friendly to real life users" mode).
Ext4 uses very different algorithms, and it's not strictly speaking
random since it uses a cur-down md4 hash of the directory name to
decide where to place the directory inode (and the location of the
directory inode, affects both the files created in that inode as well
as the blocks allocated to those files, as in ext3). So as long as
the directory hash seed in the superblock stays constant, and the
directory and file names created stay constant, the inode and block
layout will also be consistent.
All of this having been said, it may very well be possible to improve
on the anti-fragmentation algorithms while still trying to allocate
block groups closer to the beginning of the disk to take advantage of
the inner-diamater/outer-diameter placement effect. There's probably
room for some research work here. But please do be careful before
twiddling too much with the allocator algorithms, they are somewhat
subtle....
- Ted
next prev parent reply other threads:[~2010-07-06 18:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-06 1:49 inconsistent file placement Daniel Taylor
2010-07-06 2:38 ` Eric Sandeen
2010-07-06 6:52 ` Amir G.
2010-07-06 18:55 ` tytso [this message]
2010-07-06 18:59 ` Eric Sandeen
2010-07-06 22:01 ` tytso
2010-07-06 22:15 ` Daniel Taylor
2010-07-06 23:14 ` tytso
2010-07-06 23:39 ` Eric Sandeen
2010-07-07 1:08 ` Daniel Taylor
2010-07-07 2:29 ` Eric Sandeen
2010-07-06 23:34 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100706185548.GA26677@thunk.org \
--to=tytso@mit.edu \
--cc=Daniel.Taylor@wdc.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).