From: Chris Mason <chris.mason@oracle.com>
To: "Ted Ts'o" <tytso@mit.edu>,
Jacek Luczak <difrost.kernel@gmail.com>,
linux-ext4@vger.kernel.org,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-btrfs@vger.kernel.org
Subject: Re: getdents - ext4 vs btrfs performance
Date: Fri, 2 Mar 2012 14:50:45 -0500 [thread overview]
Message-ID: <20120302195045.GA28296@shiny> (raw)
In-Reply-To: <20120302193215.GB22215@thunk.org>
On Fri, Mar 02, 2012 at 02:32:15PM -0500, Ted Ts'o wrote:
> On Fri, Mar 02, 2012 at 09:26:51AM -0500, Chris Mason wrote:
> >
> > filefrag will tell you how many extents each file has, any file with
> > more than one extent is interesting. (The ext4 crowd may have better
> > suggestions on measuring fragmentation).
>
> You can get a *huge* amount of information (probably more than you'll
> want to analyze) by doing this:
>
> e2fsck -nf -E fragcheck /dev/XXXX >& /tmp/fragcheck.out
>
> I haven't had time to do this in a while, but a while back I used this
> to debug the writeback code with an eye towards reducing
> fragmentation. At the time I was trying to optimize the case of
> reducing fragmentation in the easist case possible, where you start
> with an empty file system, and then copy all of the data from another
> file system onto it using rsync -avH.
>
> It would be worth while to see what happens with files written by the
> compiler and linker. Given that libelf tends to write .o files
> non-sequentially, and without telling us how big the space is in
> advance, I could well imagine that we're not doing the best job
> avoiding free space fragmentation, which eventually leads to extra
> file system aging.
I just realized that I confused things. He's doing a read on the
results of a cp -a to a fresh FS, so there's no way the compiler/linker
are causing trouble.
>
> It would be interesting to have a project where someone added
> fallocate() support into libelf, and then added some hueristics into
> ext4 so that if a file is fallocated to a precise size, or if the file
> is fully written and closed before writeback begins, that we use this
> to more efficiently pack the space used by the files by the block
> allocator. This is a place where I would not be surprised that XFS
> has some better code to avoid accelerated file system aging, and where
> we could do better with ext4 with some development effort.
The part I don't think any of us have solved is writing back the files
in a good order after we've fallocated the blocks.
So this will probably be great for reads and not so good for writes.
>
> Of course, it might also be possible to hack around this by simply
> using VPATH and dropping your build files in a separate place from
> your source files, and periodically reformatting the file system where
> your build tree lives. (As a side note, something that works well for
> me is to use an SSD for my source files, and a separate 5400rpm HDD
> for my build tree. That allows me to use a smaller and more
> affordable SSD, and since the object files can be written
> asynchronously by the writeback threads, but the compiler can't move
> forward until it gets file data from the .c or .h file, it gets me the
> best price/performance for a laptop build environment.)
mkfs for defrag ;) It's the only way to be sure.
>
> BTW, I suspect we could make acp even more efficient by teaching it to
> use FIEMAP ioctl to map out the data blocks for all of the files in
> the source file system, and then copied the files (or perhaps even
> parts of files) in a read order which reduced seeking on the source
> drive.
acp does have a -b mode where it fibmaps (I was either lazy or it is
older than fiemap, I forget) the first block in the file, and uses that
to sort. It does help if the file blocks aren't ordered well wrt their
inode numbers, but not if the files are fragmented.
It's also worth mentioning that acp doesn't actually cp. I never got
that far. It was supposed to be the perfect example of why everything
should be done via aio, but it just ended up demonstrating that ordering
by inode number and leveraging kernel/hardware reada were more
important.
-chris
next prev parent reply other threads:[~2012-03-02 19:50 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-29 13:52 getdents - ext4 vs btrfs performance Jacek Luczak
2012-02-29 13:55 ` Jacek Luczak
2012-02-29 14:07 ` Jacek Luczak
2012-02-29 14:21 ` Jacek Luczak
2012-02-29 14:42 ` Chris Mason
2012-02-29 14:55 ` Jacek Luczak
2012-03-01 13:35 ` Jacek Luczak
2012-03-01 13:50 ` Hillf Danton
2012-03-01 14:03 ` Jacek Luczak
2012-03-01 14:18 ` Chris Mason
2012-03-01 14:43 ` Jacek Luczak
2012-03-01 14:51 ` Chris Mason
2012-03-01 14:57 ` Jacek Luczak
2012-03-01 18:42 ` Ted Ts'o
2012-03-02 9:51 ` Jacek Luczak
2012-03-01 4:44 ` Theodore Tso
2012-03-01 14:38 ` Chris Mason
2012-03-02 10:05 ` Jacek Luczak
2012-03-02 14:00 ` Chris Mason
2012-03-02 14:16 ` Jacek Luczak
2012-03-02 14:26 ` Chris Mason
2012-03-02 19:32 ` Ted Ts'o
2012-03-02 19:50 ` Chris Mason [this message]
2012-03-05 13:10 ` Jan Kara
2012-03-03 22:41 ` Jacek Luczak
2012-03-04 10:25 ` Jacek Luczak
2012-03-05 11:32 ` Jacek Luczak
2012-03-06 0:37 ` Chris Mason
2012-03-08 17:02 ` Phillip Susi
2012-03-09 11:29 ` Lukas Czerner
2012-03-09 14:34 ` Chris Mason
2012-03-10 0:09 ` Andreas Dilger
2012-03-10 4:48 ` Ted Ts'o
2012-03-11 10:30 ` Andreas Dilger
2012-03-11 16:13 ` Ted Ts'o
2012-03-15 10:42 ` Jacek Luczak
2012-03-18 20:56 ` Ted Ts'o
2012-03-13 19:05 ` Phillip Susi
2012-03-13 19:53 ` Ted Ts'o
2012-03-13 20:22 ` Phillip Susi
2012-03-13 21:33 ` Ted Ts'o
2012-03-14 2:48 ` Yongqiang Yang
2012-03-14 2:51 ` Ted Ts'o
2012-03-14 14:17 ` Zach Brown
2012-03-14 16:48 ` Ted Ts'o
2012-03-14 17:37 ` Zach Brown
2012-03-14 8:12 ` Lukas Czerner
2012-03-14 9:29 ` Yongqiang Yang
2012-03-14 9:38 ` Lukas Czerner
2012-03-14 12:50 ` Ted Ts'o
2012-03-14 14:34 ` Lukas Czerner
2012-03-14 17:02 ` Ted Ts'o
2012-03-14 19:17 ` Chris Mason
2012-03-14 14:28 ` Phillip Susi
2012-03-14 16:54 ` Ted Ts'o
2012-03-10 3:52 ` Ted Ts'o
2012-03-15 7:59 ` Jacek Luczak
-- strict thread matches above, loose matches on Subject: below --
2012-02-29 13:31 Jacek Luczak
2012-02-29 13:51 ` Chris Mason
2012-02-29 14:00 ` Lukas Czerner
2012-02-29 14:05 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120302195045.GA28296@shiny \
--to=chris.mason@oracle.com \
--cc=difrost.kernel@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).