* Ext4 without a journal: some benchmark results
@ 2009-01-07 19:29 Curt Wohlgemuth
2009-01-07 20:47 ` Theodore Tso
2009-01-08 13:03 ` Andreas Dilger
0 siblings, 2 replies; 6+ messages in thread
From: Curt Wohlgemuth @ 2009-01-07 19:29 UTC (permalink / raw)
To: linux-ext4
Hi:
I promised back in mid-December to send out some benchmark numbers I'm
seeing with Frank Mayhar's work to allow ext4 to run without a journal. My
apologies for the delay...
I ran both iozone and compilebench on the following filesystems, using a
2.6.26-based kernel, with most ext4 patches applied. This is on a x86 based
4-core system, with a separate disk for these runs.
ext2, default create/mount options
ext3, default create/mount options
ext4, default create/mount options
ext4, created with "-O ^has_journal"
For each filesystem, I ran each benchmark twice, doing a mke2fs before each
run. The same disk was used for each run; all benchmarks ran in the mount
directory of the newly mkfs'ed disk. I averaged the values for the two runs
for each FS/thread number.
Iozone was run with the following command line:
iozone -t (# threads) -s 2g -r 256k -I -T -i0 -i1 -i2
I.e., throughput mode; 2GiB file; 256KiB buffer; O_DIRECT. Tests were
limited to
write/rewrite
read/re-read
random-read/write
I ran iozone twice for each FS: with a single thread (-t 1) and with 8
threads (-t 8).
Compilebench was run with the following command line:
compilebench -D (mount dir) -i 10 -r 30
I.e., 10 kernel trees, 30 "random operation" runs.
Results follow.
Thanks,
Curt
Iozone
======
ext2 : 1 thread
---------------
Average throughput:
Type Mean Stddev
initial_writers: 56.6 MB/s ( 0.2)
rewriters: 58.4 MB/s ( 0.2)
readers: 66.3 MB/s ( 0.2)
re-readers: 66.5 MB/s ( 0.0)
random_readers: 22.4 MB/s ( 0.1)
random_writers: 18.8 MB/s ( 0.0)
ext2 : 8 threads
----------------
Average throughput:
Type Mean Stddev
initial_writers: 28.5 MB/s ( 0.0)
rewriters: 43.5 MB/s ( 0.1)
readers: 51.5 MB/s ( 0.1)
re-readers: 51.8 MB/s ( 0.2)
random_readers: 20.3 MB/s ( 0.0)
random_writers: 17.3 MB/s ( 0.0)
ext3 : 1 thread
----------------
Average throughput:
Type Mean Stddev
initial_writers: 56.3 MB/s ( 0.2)
rewriters: 58.2 MB/s ( 0.1)
readers: 66.4 MB/s ( 0.1)
re-readers: 66.1 MB/s ( 0.2)
random_readers: 22.1 MB/s ( 0.1)
random_writers: 18.6 MB/s ( 0.1)
ext3 : 8 threads
----------------
Average throughput:
Type Mean Stddev
initial_writers: 28.7 MB/s ( 0.1)
rewriters: 43.2 MB/s ( 0.2)
readers: 51.5 MB/s ( 0.0)
re-readers: 51.5 MB/s ( 0.0)
random_readers: 20.2 MB/s ( 0.0)
random_writers: 17.3 MB/s ( 0.0)
ext4-nojournal : 1 thread
-------------------------
Average throughput:
Type Mean Stddev
initial_writers: 66.3 MB/s ( 0.2)
rewriters: 66.6 MB/s ( 0.1)
readers: 66.4 MB/s ( 0.0)
re-readers: 66.4 MB/s ( 0.0)
random_readers: 22.4 MB/s ( 0.1)
random_writers: 19.4 MB/s ( 0.2)
ext4-nojournal : 8 threads
--------------------------
Average throughput:
Type Mean Stddev
initial_writers: 56.1 MB/s ( 0.1)
rewriters: 60.3 MB/s ( 0.2)
readers: 61.0 MB/s ( 0.0)
re-readers: 61.0 MB/s ( 0.0)
random_readers: 20.4 MB/s ( 0.1)
random_writers: 18.3 MB/s ( 0.1)
ext4-stock : 1 thread
----------------------
Average throughput:
Type Mean Stddev
initial_writers: 65.5 MB/s ( 0.1)
rewriters: 65.7 MB/s ( 0.2)
readers: 65.8 MB/s ( 0.2)
re-readers: 65.6 MB/s ( 0.3)
random_readers: 21.9 MB/s ( 0.0)
random_writers: 19.1 MB/s ( 0.1)
ext4-stock : 8 threads
----------------------
Average throughput:
Type Mean Stddev
initial_writers: 53.7 MB/s ( 0.2)
rewriters: 58.3 MB/s ( 0.1)
readers: 58.8 MB/s ( 0.1)
re-readers: 59.0 MB/s ( 0.1)
random_readers: 20.2 MB/s ( 0.0)
random_writers: 18.1 MB/s ( 0.0)
Compilebench
============
ext2
----
Average values:
Type Mean Stddev
initial_create: 57.9 MB_s ( 1.9)
new_create: 13.0 MB_s ( 0.2)
patch: 7.3 MB_s ( 0.1)
compile: 25.6 MB_s ( 0.6)
clean: 70.4 MB_s ( 1.3)
read_tree: 22.1 MB_s ( 0.0)
read_compiled_tree: 33.3 MB_s ( 0.2)
delete_tree: 6.5 secs ( 0.2)
stat_tree: 5.2 secs ( 0.0)
stat_compiled_tree: 5.7 secs ( 0.1)
ext3
----
Average values:
Type Mean Stddev
initial_create: 30.6 MB_s ( 2.2)
new_create: 13.5 MB_s ( 0.2)
patch: 10.6 MB_s ( 0.1)
compile: 18.0 MB_s ( 0.3)
clean: 41.7 MB_s ( 1.8)
read_tree: 21.5 MB_s ( 0.2)
read_compiled_tree: 20.4 MB_s ( 1.1)
delete_tree: 13.5 secs ( 0.3)
stat_tree: 6.7 secs ( 0.4)
stat_compiled_tree: 9.6 secs ( 2.9)
ext4-nojournal
--------------
Average values:
Type Mean Stddev
initial_create: 77.1 MB_s ( 0.2)
new_create: 22.0 MB_s ( 0.1)
patch: 13.1 MB_s ( 0.0)
compile: 36.0 MB_s ( 0.1)
clean: 592.4 MB_s (39.4)
read_tree: 17.8 MB_s ( 0.2)
read_compiled_tree: 22.1 MB_s ( 0.1)
delete_tree: 2.5 secs ( 0.0)
stat_tree: 2.2 secs ( 0.0)
stat_compiled_tree: 2.5 secs ( 0.0)
ext4-stock
----------
Average values:
Type Mean Stddev
initial_create: 59.7 MB_s ( 0.4)
new_create: 20.5 MB_s ( 0.0)
patch: 12.5 MB_s ( 0.0)
compile: 33.9 MB_s ( 0.2)
clean: 539.5 MB_s ( 3.6)
read_tree: 17.1 MB_s ( 0.1)
read_compiled_tree: 21.8 MB_s ( 0.1)
delete_tree: 2.7 secs ( 0.1)
stat_tree: 2.4 secs ( 0.0)
stat_compiled_tree: 2.5 secs ( 0.2)
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Ext4 without a journal: some benchmark results 2009-01-07 19:29 Ext4 without a journal: some benchmark results Curt Wohlgemuth @ 2009-01-07 20:47 ` Theodore Tso 2009-01-07 21:19 ` Curt Wohlgemuth 2009-01-08 13:03 ` Andreas Dilger 1 sibling, 1 reply; 6+ messages in thread From: Theodore Tso @ 2009-01-07 20:47 UTC (permalink / raw) To: Curt Wohlgemuth; +Cc: linux-ext4 On Wed, Jan 07, 2009 at 11:29:11AM -0800, Curt Wohlgemuth wrote: > > I ran both iozone and compilebench on the following filesystems, using a > 2.6.26-based kernel, with most ext4 patches applied. This is on a x86 based > 4-core system, with a separate disk for these runs. Curt, thanks for doing these test runs. One interesting thing to note is that even though ext3 was running with barriers disabled, and ext4 was running with barriers enabled, ext4 still showed consistently better resuls. (Or was this on an LVM/dm setup where barriers were getting disabled?) I took the liberty of reformatting the results so I could look at them more easily: Iozone, 1 Thread Average throughput ext2 ext3 ext4 ext4-nojournal Type Mean Stddev Mean Stddev Mean Stddev Mean Stddev initl_writers: 56.6 MB/s (0.2) 56.3 MB/s (0.2) 65.5 MB/s (0.1) 66.3 MB/s (0.2) rewriters: 58.4 MB/s (0.2) 58.2 MB/s (0.1) 65.7 MB/s (0.2) 66.6 MB/s (0.1) readers: 66.3 MB/s (0.2) 66.4 MB/s (0.1) 65.8 MB/s (0.2) 66.4 MB/s (0.0) re-readers: 66.5 MB/s (0.0) 66.1 MB/s (0.2) 65.6 MB/s (0.3) 66.4 MB/s (0.0) random_readers: 22.4 MB/s (0.1) 22.1 MB/s (0.1) 21.9 MB/s (0.0) 22.4 MB/s (0.1) random_writers: 18.8 MB/s (0.0) 18.6 MB/s (0.1) 19.1 MB/s (0.1) 19.4 MB/s (0.2) Iozone, 8 Threads Average throughput ext2 ext3 ext4 ext4-nojournal Type Mean Stddev Mean Stddev Mean Stddev Mean Stddev initl_writers: 28.5 MB/s (0.0) 28.7 MB/s (0.1) 53.7 MB/s (0.2) 56.1 MB/s (0.1) rewriters: 43.5 MB/s (0.1) 43.2 MB/s (0.2) 58.3 MB/s (0.1) 60.3 MB/s (0.2) readers: 51.5 MB/s (0.1) 51.5 MB/s (0.0) 58.8 MB/s (0.1) 61.0 MB/s (0.0) re-readers: 51.8 MB/s (0.2) 51.5 MB/s (0.0) 59.0 MB/s (0.1) 61.0 MB/s (0.0) random_readers: 20.3 MB/s (0.0) 20.2 MB/s (0.0) 20.2 MB/s (0.0) 20.4 MB/s (0.1) random_writers: 17.3 MB/s (0.0) 17.3 MB/s (0.0) 18.1 MB/s (0.0) 18.3 MB/s (0.1) Compilebench Average values ext2 ext3 ext4 ext4-nojournal Type Mean Stddev Mean Stddev Mean Stddev Mean Stddev init_create: 57.9 MB_s (1.9) 30.6 MB_s (2.2) 59.7 MB_s (0.4) 77.1 MB_s ( 0.2) new_create: 13.0 MB_s (0.2) 13.5 MB_s (0.2) 20.5 MB_s (0.0) 22.0 MB_s ( 0.1) patch: 7.3 MB_s (0.1) 10.6 MB_s (0.1) 12.5 MB_s (0.0) 13.1 MB_s ( 0.0) compile: 25.6 MB_s (0.6) 18.0 MB_s (0.3) 33.9 MB_s (0.2) 36.0 MB_s ( 0.1) clean: 70.4 MB_s (1.3) 41.7 MB_s (1.8) 539.5 MB_s (3.6) 592.4 MB_s (39.4) read_tree: 22.1 MB_s (0.0) 21.5 MB_s (0.2) 17.1 MB_s (0.1) 17.8 MB_s ( 0.2) read_compld: 33.3 MB_s (0.2) 20.4 MB_s (1.1) 21.8 MB_s (0.1) 22.1 MB_s ( 0.1) delete_tree: 6.5 secs (0.2) 13.5 secs (0.3) 2.7 secs (0.1) 2.5 secs ( 0.0) stat_tree: 5.2 secs (0.0) 6.7 secs (0.4) 2.4 secs (0.0) 2.2 secs ( 0.0) stat_compld: 5.7 secs (0.1) 9.6 secs (2.9) 2.5 secs (0.2) 2.5 secs ( 0.0) A couple of things to note. If you were testing Frank's patches, I made one additional optimization to his patch, which removed the orphaned inode handling. This wasn't necessary if you're running without the journal, I'm not sure if this would be measurable in your benchmarks, since the inodes that would be getting modified were probably going to be dirtied and require writeback anyway, but you might get sightly better numbers with the version of the patch I ultimately pushed to Linus. The other thing to note is that in Compilebench's read_tree, ext2 and ext3 are scoring better than ext4. This is probably related to ext4's changes in its block/inode allocation hueristics, which is something that we probably should look at as part of tuning exercises. The brtfs.boxacle.net benchmarks showed something similar, which I also would attribute to changes in ext4's allocation policies. - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Ext4 without a journal: some benchmark results 2009-01-07 20:47 ` Theodore Tso @ 2009-01-07 21:19 ` Curt Wohlgemuth 2009-01-08 2:17 ` Theodore Tso 0 siblings, 1 reply; 6+ messages in thread From: Curt Wohlgemuth @ 2009-01-07 21:19 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 Hi Ted: On Wed, Jan 7, 2009 at 12:47 PM, Theodore Tso <tytso@mit.edu> wrote: > On Wed, Jan 07, 2009 at 11:29:11AM -0800, Curt Wohlgemuth wrote: >> >> I ran both iozone and compilebench on the following filesystems, using a >> 2.6.26-based kernel, with most ext4 patches applied. This is on a x86 based >> 4-core system, with a separate disk for these runs. > > Curt, thanks for doing these test runs. One interesting thing to note > is that even though ext3 was running with barriers disabled, and ext4 > was running with barriers enabled, ext4 still showed consistently > better resuls. (Or was this on an LVM/dm setup where barriers were > getting disabled?) Nope. Barriers were enabled for both ext4 versions below. > A couple of things to note. If you were testing Frank's patches, I > made one additional optimization to his patch, which removed the > orphaned inode handling. This wasn't necessary if you're running > without the journal, I'm not sure if this would be measurable in your > benchmarks, since the inodes that would be getting modified were > probably going to be dirtied and require writeback anyway, but you > might get sightly better numbers with the version of the patch I > ultimately pushed to Linus. I see the change you pushed; I'll integrate this and see if the numbers look any different. > The other thing to note is that in Compilebench's read_tree, ext2 and > ext3 are scoring better than ext4. This is probably related to ext4's > changes in its block/inode allocation hueristics, which is something > that we probably should look at as part of tuning exercises. The > brtfs.boxacle.net benchmarks showed something similar, which I also > would attribute to changes in ext4's allocation policies. Can you enlighten me as to what aspect of block allocation might be involved in the slowdown here? Which block group these allocations are made from? Or something more low-level than that? Thanks, Curt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Ext4 without a journal: some benchmark results 2009-01-07 21:19 ` Curt Wohlgemuth @ 2009-01-08 2:17 ` Theodore Tso 0 siblings, 0 replies; 6+ messages in thread From: Theodore Tso @ 2009-01-08 2:17 UTC (permalink / raw) To: Curt Wohlgemuth; +Cc: linux-ext4 On Wed, Jan 07, 2009 at 01:19:07PM -0800, Curt Wohlgemuth wrote: > > > > Curt, thanks for doing these test runs. One interesting thing to note > > is that even though ext3 was running with barriers disabled, and ext4 > > was running with barriers enabled, ext4 still showed consistently > > better resuls. (Or was this on an LVM/dm setup where barriers were > > getting disabled?) > > Nope. Barriers were enabled for both ext4 versions below. Well, barriers won't metter in the nojournal case, but it's nice to know that for these workloads, ext4-stock (w/journalling) is faster even that ext3 w/o barriers. That's probably not be true with a metadata-heavy workload with fsync's, such as fsmark, though. > > The other thing to note is that in Compilebench's read_tree, ext2 and > > ext3 are scoring better than ext4. This is probably related to ext4's > > changes in its block/inode allocation hueristics, which is something > > that we probably should look at as part of tuning exercises. The > > brtfs.boxacle.net benchmarks showed something similar, which I also > > would attribute to changes in ext4's allocation policies. > > Can you enlighten me as to what aspect of block allocation might be > involved in the slowdown here? Which block group these allocations > are made from? Or something more low-level than that? Ext4's block allocation algorithsm are quite different from ext3, but that's not what I'm worried about. Ext4's mballoc algorithms are much more aggressive to find contiguous blocks, and that's a good thing. There may be some issues about how it decides to do its localilty group preallocation vs streaming preallocation, but these are all tactical issues that in the end probably don't make that big of a difference. There may also be some issues about which block group mballoc chooses if its home block group is full, but I suspect those are second-order issues. The bigger problem is the strategic level issues of how inodes are allocated, in particular when new directories are allocated. It is much more aggressive about keeping subdirectories in the same block group. It also completely disables the Orlov allocator algorithsm to spread out top-level directories and directories (such as /home) that would have the top-level directory flag set. Indeed, the new ext4 allocation algorithm doesn't differentiate between directories and inodes in its allocation algorithms at all. My concern with the current algorithms is that for very short benchmarks, it keeps everything very closely packed together at the beginning of the filesystem, which is probably good for those benchmarks. But for more complex benchmarks and longer-lived filesystems where aging is a concern, the lack of spreading may cause a much bigger set of problems, especially in the long-term. There some other changes I want to make that involve avoid putting inodes in block group that area multiple of the flex block group size, since all of the inode table blocks and block/inode allocation bitmaps are stored in those block groups, and reserving the blocks in that block group for directory blocks in that block group, but that requires testing to make sure it makes sense. - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Ext4 without a journal: some benchmark results 2009-01-07 19:29 Ext4 without a journal: some benchmark results Curt Wohlgemuth 2009-01-07 20:47 ` Theodore Tso @ 2009-01-08 13:03 ` Andreas Dilger 2009-01-08 17:20 ` Curt Wohlgemuth 1 sibling, 1 reply; 6+ messages in thread From: Andreas Dilger @ 2009-01-08 13:03 UTC (permalink / raw) To: Curt Wohlgemuth; +Cc: linux-ext4 On Jan 07, 2009 11:29 -0800, Curt Wohlgemuth wrote: > Iozone was run with the following command line: > > iozone -t (# threads) -s 2g -r 256k -I -T -i0 -i1 -i2 > > I.e., throughput mode; 2GiB file; 256KiB buffer; O_DIRECT. Tests were > limited to How much RAM is on the test system? If the file size is only 2GB then it will likely fit into RAM, which is possibly why the performance numbers of all the filesystems is so close together. The other possibility is that a single disk is the performance bottleneck and all of the filesystems can feed a single disk at a reasonable rate. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Ext4 without a journal: some benchmark results 2009-01-08 13:03 ` Andreas Dilger @ 2009-01-08 17:20 ` Curt Wohlgemuth 0 siblings, 0 replies; 6+ messages in thread From: Curt Wohlgemuth @ 2009-01-08 17:20 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-ext4 Hi Andreas: On Thu, Jan 8, 2009 at 5:03 AM, Andreas Dilger <adilger@sun.com> wrote: > On Jan 07, 2009 11:29 -0800, Curt Wohlgemuth wrote: >> Iozone was run with the following command line: >> >> iozone -t (# threads) -s 2g -r 256k -I -T -i0 -i1 -i2 >> >> I.e., throughput mode; 2GiB file; 256KiB buffer; O_DIRECT. Tests were >> limited to > > How much RAM is on the test system? If the file size is only 2GB then > it will likely fit into RAM, which is possibly why the performance > numbers of all the filesystems is so close together. The other possibility > is that a single disk is the performance bottleneck and all of the > filesystems can feed a single disk at a reasonable rate. Indeed, the system was not memory-limited at all. I've done some playing around with how limiting memory affects random reads in iozone with O_DIRECT, and have found that, as expected, ext4 is much less affected than ext2. I'm assuming this is because the metadata isn't in the page cache, and the far larger number of metadata blocks on ext2 than ext4 in this case causes a bigger hit on ext2. If I generate numbers on a low-memory system, I'll post them here too. Thanks, Curt ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-01-08 17:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-07 19:29 Ext4 without a journal: some benchmark results Curt Wohlgemuth 2009-01-07 20:47 ` Theodore Tso 2009-01-07 21:19 ` Curt Wohlgemuth 2009-01-08 2:17 ` Theodore Tso 2009-01-08 13:03 ` Andreas Dilger 2009-01-08 17:20 ` Curt Wohlgemuth
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox