Re: Journal file fragmentation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Jose R. Santos" <jrs@us.ibm.com>
To: "Frédéric Bohé" <frederic.bohe@bull.net>
Cc: linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: Journal file fragmentation
Date: Wed, 27 Aug 2008 15:12:45 -0500	[thread overview]
Message-ID: <20080827151245.761d38b0@ichigo> (raw)
In-Reply-To: <1219858567.3591.64.camel@frecb007923.frec.bull.fr>

On Wed, 27 Aug 2008 19:36:07 +0200
Frédéric Bohé <frederic.bohe@bull.net> wrote:

> While playing with filesystems using flex bg, I noticed that the journal
> file may be fragmented when there are a lots of meta-data in  the first
> flex-group.
> For example, with this command : mkfs.ext4 -t ext4dev -G512 /dev/sdb1
> The journal file is reported by "stat <8>" in debugfs to be like this :
> 
> Inode: 8   Type: regular    Mode:  0600   Flags: 0x0
> Generation: 0    Version: 0x00000000
> User:     0   Group:     0   Size: 134217728
> File ACL: 0    Directory ACL: 0
> Links: 1   Blockcount: 262416
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x48b4a426 -- Wed Aug 27 02:47:34 2008
> atime: 0x00000000 -- Thu Jan  1 01:00:00 1970
> mtime: 0x48b4a426 -- Wed Aug 27 02:47:34 2008
> Size of extra inode fields: 0
> BLOCKS:
> (0-11):28679-28690, (IND):28691, (12-1035):28692-29715, (DIND):29716,
> (IND):29717, (1036-2059):29718-30741, (IND):30742,
> (2060-3083):30743-31766, (IND):31767, (3084-4083):31768-32767,
> (4084-4107):94209-94232, (IND):94233, (4108-5131):94234-95257,
> (IND):95258, (5132-6155):95259-96282, (IND):96283,
> (6156-7179):96284-97307, (IND):97308, (7180-8174):97309-98303,
> (8175-8203):159745-159773, (IND):159774, (8204-9227):159775-160798,
> (IND):160799, (9228-10251):160800-161823, (IND):161824,
> (10252-11275):161825-162848, (IND):162849, (11276-12265):162850-163839,
> (12266-12299):225281-225314, (IND):225315, (12300-13323):225316-226339,
> (IND):226340, (13324-14347):226341-227364, (IND):227365,
> (14348-15371):227366-228389, (IND):228390, (15372-16356):228391-229375,
> (16357-16395):284673-284711, (IND):284712, (16396-17419):284713-285736,
> (IND):285737, (17420-18443):285738-286761, (IND):286762,
> (18444-19467):286763-287786, (IND):287787, (19468-20491):287788-288811,
> (IND):288812, (20492-21515):288813-289836, (IND):289837,
> (21516-22539):289838-290861, (IND):290862, (22540-23563):290863-291886,
> (IND):291887, (23564-24587):291888-292911, (IND):292912,
> (24588-25611):292913-293936, (IND):293937, (25612-26585):293938-294911,
> (26586-26635):295937-295986, (IND):295987, (26636-27659):295988-297011,
> (IND):297012, (27660-28683):297013-298036, (IND):298037,
> (28684-29707):298038-299061, (IND):299062, (29708-30731):299063-300086,
> (IND):300087, (30732-31755):300088-301111, (IND):301112,
> (31756-32768):301113-302125
> TOTAL: 32802
> 
> This journal file is splited in 5 parts : some blocks at 28679-32767,
> then 94209-98303, then 159745-163839, then 225281-229375 and finally
> 284673-302125
> 
> Of course "-G512" in the mkfs commad line is an extreme case but it
> shows clearly the fragmentation.
> 
> I've tried to find if this fragmentation has any performance impact. So
> I've quickly wrote the following patch for the mkfs program :
> 
> Index: e2fsprogs/lib/ext2fs/mkjournal.c
> ===================================================================
> --- e2fsprogs.orig/lib/ext2fs/mkjournal.c       2008-08-27 02:37:59.000000000 +0200
> +++ e2fsprogs/lib/ext2fs/mkjournal.c    2008-08-27 14:51:02.000000000 +0200
> @@ -220,7 +220,11 @@ static int mkjournal_proc(ext2_filsys      fs
>                 last_blk = *blocknr;
>                 return 0;
>         }
> -       retval = ext2fs_new_block(fs, last_blk, 0, &new_blk);
> +       retval = ext2fs_get_free_blocks(fs, ref_block,
> +                                       fs->super->s_blocks_count,
> +                                       es->num_blocks, fs->block_map,
> +                                       &new_blk);
> +
>         if (retval) {
>                 es->err = retval;
>                 return BLOCK_ABORT;
> 
> This makes the mkfs time a bit longer but ends up with an unfragmented
> journal file : debugfs stat<8> reports that the journal file uses
> contiguous blocks from 295937 to 328738.

The problem with this approach is that mkfs will take longer still as
you make -G xxx larger since ext2fs_get_free_blocks() is not very smart
at finding a large number of contiguous blocks.  If I understand this
correctly, the main problem we have here is that we start the new block
search from block 0.  A better approach would be to start
ext2fs_new_block() from the last block of the last inode table in a
flex_bg.  This way we avoid the fragmentation issues we see when the
inode tables for a flexbg are larger that the capacity of a single
block group.


> Then I've launched bonnie++ for testing performance impact.This is my
> test script :
> 
> mkfs.ext4 -t ext4dev -G512 /dev/sdb1
> mount -t ext4dev -o data=journal /dev/sdb1 /mnt/test
> bonnie++ -u root -s 0 -n 4000 -d /mnt/test/
> 
> And the results:
> 
> Without patch :
> 
> Version 1.03d       ------Sequential Create------ --------Random Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                4000  3978   7   602   0   518   1  3962   8   520   0   326   1
> 
> With patch :
> 
> Version 1.03d       ------Sequential Create------ --------Random Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                4000  4180   8   736   1   543   1  4029   8   556   0   335   1
> 
> Difference :
>                      
>                      +5.0      +22%     +4.8%      +1.6%     +6.9%     +2.7%
> 
> Conclusion :
> 
> First, the higher performance enhancement are on read operation, which,
> if i am not wrong, has nothing to do with the journal file. This is
> surprising and may indicate that those results are wrong, but I can't
> see why right now.
> Second, there is a slight enhancement on write operations so the journal
> file defragmentation seems to have a positive impact in this test.
> 
> I'm still bothered by the performance increase in read. So I will launch
> some more tests and see if it is consistant.
> 
> Please, feel free to give me any comments you may have on this subject.
> 
> Thanks.
> 
> Frederic

-JRS
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2008-08-27 20:14 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-27 17:36 Journal file fragmentation Frédéric Bohé
2008-08-27 20:12 ` Jose R. Santos [this message]
2008-08-27 21:06 ` Theodore Tso
2008-08-27 21:14   ` [PATCH 1/4] ext2fs_mkjournal(): Don't allocate an extra block to the journal Theodore Ts'o
2008-08-27 21:14     ` [PATCH 2/4] Create the journal in the middle of the filesystem Theodore Ts'o
2008-08-27 21:14       ` [PATCH 3/4] ext2fs_block_iterate2: Add BLOCK_FLAG_APPEND support for extent-based files Theodore Ts'o
2008-08-27 21:14         ` [PATCH 4/4] If the filesystem supports extents create an extent-based journal inode Theodore Ts'o
2008-08-28  9:55       ` [PATCH 2/4] Create the journal in the middle of the filesystem Frédéric Bohé
2008-08-28 13:34         ` Theodore Tso
2008-08-28 13:40           ` Ric Wheeler
2008-08-28 14:36             ` Theodore Tso
2008-08-28 14:38               ` Ric Wheeler
2008-09-03 14:08                 ` Ric Wheeler
2008-08-28 16:16           ` Frédéric Bohé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080827151245.761d38b0@ichigo \
    --to=jrs@us.ibm.com \
    --cc=frederic.bohe@bull.net \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.