ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
From: Tristan Ye <tristan.ye@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 00/16] Ocfs2: Online defragmentaion V4.
Date: Fri, 18 Mar 2011 14:35:27 +0800	[thread overview]
Message-ID: <1300430143-23909-1-git-send-email-tristan.ye@oracle.com> (raw)

 *. Let defrag handle partial extent moving

 *. Incorporate Mark's comments.

 *. Set several trivial constraints for threshold.


Rebased on 2.6.38:

http://oss.oracle.com/git/tye/ocfs2-tools.git/?p=tye/linux-2.6.git;a=shortlog;h=move_extents

-------------------------------------------------------------------------------
Changes since v2:

 *. Add refcount support.

 *. Share Copy-On-Writes codes with refcounttree.c

 *. Re-organize the ordering of patches.

 *. Fix several trivial bugs.

-------------------------------------------------------------------------------
Changes since v1:

 *. implement following #2 strategy(simple extent_moving).

	It's a quite rough patches series v2 for online defrag/ext_moving on OCFS2, it's
workable anyway, may look ugly though;) The essence of online file defragmentation is
extents moving like what btrfs and ext4 were doing, adding 'OCFS2_IOC_MOVE_EXT' ioctl
to ocfs2 allows two strategies upon defragmentation:

1. simple-defragmentation-in-kernl, which means kernel will be responsible for
   claiming new clusters, and packing the defragmented extents according to a
   user-specified threshold.

2. simple-extents moving, in this case, userspace play much more important role
   when doing defragmentation, it needs to specify the new physical blk_offset
   where extents will be moved, kernel itself will not do anything more than
   moving the extents per requested, maybe kernel also needs to manage to
   probe/validate the new_blkoffset to guarantee enough free space around there.

Above two operations using the same OCFS2_IOC_MOVE_EXT:
-------------------------------------------------------------------------------
#define OCFS2_MOVE_EXT_FL_AUTO_DEFRAG   (0x00000001)    /* Kernel manages to
                                                           claim new clusters
                                                           as the goal place
                                                           for extents moving */
#define OCFS2_MOVE_EXT_FL_COMPLETE      (0x00000002)    /* Move or defragmenation
                                                           completely gets done.
                                                         */
struct ocfs2_move_extents {
/* All values are in bytes */
        /* in */
        __u64 me_start;         /* Virtual start in the file to move */
        __u64 me_len;           /* Length of the extents to be moved */
        __u64 me_goal;          /* Physical offset of the goal */
        __u64 me_thresh;        /* Maximum distance from goal or threshold
                                   for auto defragmentation */
        __u64 me_flags;         /* flags for the operation:
                                 * - auto defragmentation.
                                 * - refcount,xattr cases.
                                 */

        /* out */
        __u64 me_moved_len;     /* moved length, are we completely done? */
        __u64 me_new_offset;    /* Resulting physical location */
        __u32 me_reserved[2];   /* reserved for futhure */
};
-------------------------------------------------------------------------------

	Following are some interesting data gathered from simple tests:

1. Performance improvement gained on I/O reads:
-------------------------------------------------------------------------------
* Before defragmentation *

[root at ocfs2-box4 ~]# sync
[root at ocfs2-box4 ~]# echo 3>/proc/sys/vm/drop_caches 
[root at ocfs2-box4 ~]# time dd if=/storage/testfile-1 of=/dev/null
640000+0 records in
640000+0 records out
327680000 bytes (328 MB) copied, 19.9351 s, 16.4 MB/s

real	0m19.954s
user	0m0.246s
sys	0m1.111s

* Do defragmentation *

[root at ocfs2-box4 defrag]# ./defrag -s 0 -l 293601280  -t 3145728 /storage/testfile-1

* After defragmentation *

[root at ocfs2-box4 ~]# sync
[root at ocfs2-box4 ~]# echo 3>/proc/sys/vm/drop_caches
[root at ocfs2-box4 ~]# time dd if=/storage/testfile-1 of=/dev/null
640000+0 records in
640000+0 records out
327680000 bytes (328 MB) copied, 6.79885 s, 48.2 MB/s

real	0m6.969s
user	0m0.209s
sys	0m1.063s
-------------------------------------------------------------------------------


2. Extent tree layout via debugfs.ocfs2:
-------------------------------------------------------------------------------
* Before defragmentation *

        Tree Depth: 1   Count: 243   Next Free Rec: 8
        ## Offset        Clusters       Block#
        0  0             1173           86561
        1  1173          1173           84527
        2  2346          1151           81468
        3  3497          1173           76362
        4  4670          1173           74328
        5  5843          1172           66150
        6  7015          1460           70260
        7  8475          662            87680
        SubAlloc Bit: 1   SubAlloc Slot: 0
        Blknum: 86561   Next Leaf: 84527
        CRC32: abf06a6b   ECC: 44bc
        Tree Depth: 0   Count: 252   Next Free Rec: 252
        ## Offset        Clusters       Block#          Flags
        0  1             16             516104          0x0
        1  17            1              554632          0x0
        2  18            7              560144          0x0
        3  25            1              565960          0x0
        4  26            1              572632          0x
	...
	/* around 1700 extent records were hidden there */
	...
	138 9131          1              258968          0x0
        139 9132          1              259568          0x0
        140 9133          1              260168          0x0
        141 9134          1              260768          0x0
        142 9135          1              261368          0x0
        143 9136          1              261968          0x0

* After defragmentation *

      Tree Depth: 1   Count: 243   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             9137           66081
	SubAlloc Bit: 1   SubAlloc Slot: 0
	Blknum: 66081   Next Leaf: 0
	CRC32: 22897d34   ECC: 0619
	Tree Depth: 0   Count: 252   Next Free Rec: 6
	## Offset        Clusters       Block#          Flags
	0  1             1600           4412936         0x0 
	1  1601          1595           20669448        0x0 
	2  3196          1600           9358856         0x0 
	3  4796          1404           14516232        0x0 
	4  6200          1600           21627400        0x0 
	5  7800          1337           7483400         0x0 
-------------------------------------------------------------------------------


TO-DO:

1. Adding refcount/xattr support.
2. Free space defragmentation.


Go to http://oss.oracle.com/osswiki/OCFS2/DesignDocs/OnlineDefrag for more details.


Tristan.

             reply	other threads:[~2011-03-18  6:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-18  6:35 Tristan Ye [this message]
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 01/16] Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 02/16] Ocfs2/refcounttree: Publicate couple of funcs from refcounttree.c Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 03/16] Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2 Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 04/16] Ocfs2/move_extents: Add basic framework and source files for extent moving Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 05/16] Ocfs2/move_extents: lock allocators and reserve metadata blocks and data clusters for extents moving Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 06/16] Ocfs2/move_extents: move a range of extent Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 07/16] Ocfs2/move_extents: defrag " Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 08/16] Ocfs2/move_extents: find the victim alloc group, where the given #blk fits Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 09/16] Ocfs2/move_extents: helper to validate and adjust moving goal Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 10/16] Ocfs2/move_extents: helper to probe a proper region to move in an alloc group Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 11/16] Ocfs2/move_extents: helpers to update the group descriptor and global bitmap inode Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 12/16] Ocfs2/move_extents: move entire/partial extent Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 13/16] Ocfs2/move_extents: helper to calculate the defraging length in one run Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 14/16] Ocfs2/move_extents: move/defrag extents within a certain range Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 15/16] Ocfs2/move_extents: Let defrag handle partial extent moving Tristan Ye
2011-03-18  6:35 ` [Ocfs2-devel] [PATCH 16/16] Ocfs2/move_extents: Set several trivial constraints for threshold Tristan Ye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1300430143-23909-1-git-send-email-tristan.ye@oracle.com \
    --to=tristan.ye@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).