From: Tristan Ye <tristan.ye@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 00/16] Ocfs2: Online defragmentaion V4.
Date: Fri, 18 Mar 2011 14:35:27 +0800 [thread overview]
Message-ID: <1300430143-23909-1-git-send-email-tristan.ye@oracle.com> (raw)
*. Let defrag handle partial extent moving
*. Incorporate Mark's comments.
*. Set several trivial constraints for threshold.
Rebased on 2.6.38:
http://oss.oracle.com/git/tye/ocfs2-tools.git/?p=tye/linux-2.6.git;a=shortlog;h=move_extents
-------------------------------------------------------------------------------
Changes since v2:
*. Add refcount support.
*. Share Copy-On-Writes codes with refcounttree.c
*. Re-organize the ordering of patches.
*. Fix several trivial bugs.
-------------------------------------------------------------------------------
Changes since v1:
*. implement following #2 strategy(simple extent_moving).
It's a quite rough patches series v2 for online defrag/ext_moving on OCFS2, it's
workable anyway, may look ugly though;) The essence of online file defragmentation is
extents moving like what btrfs and ext4 were doing, adding 'OCFS2_IOC_MOVE_EXT' ioctl
to ocfs2 allows two strategies upon defragmentation:
1. simple-defragmentation-in-kernl, which means kernel will be responsible for
claiming new clusters, and packing the defragmented extents according to a
user-specified threshold.
2. simple-extents moving, in this case, userspace play much more important role
when doing defragmentation, it needs to specify the new physical blk_offset
where extents will be moved, kernel itself will not do anything more than
moving the extents per requested, maybe kernel also needs to manage to
probe/validate the new_blkoffset to guarantee enough free space around there.
Above two operations using the same OCFS2_IOC_MOVE_EXT:
-------------------------------------------------------------------------------
#define OCFS2_MOVE_EXT_FL_AUTO_DEFRAG (0x00000001) /* Kernel manages to
claim new clusters
as the goal place
for extents moving */
#define OCFS2_MOVE_EXT_FL_COMPLETE (0x00000002) /* Move or defragmenation
completely gets done.
*/
struct ocfs2_move_extents {
/* All values are in bytes */
/* in */
__u64 me_start; /* Virtual start in the file to move */
__u64 me_len; /* Length of the extents to be moved */
__u64 me_goal; /* Physical offset of the goal */
__u64 me_thresh; /* Maximum distance from goal or threshold
for auto defragmentation */
__u64 me_flags; /* flags for the operation:
* - auto defragmentation.
* - refcount,xattr cases.
*/
/* out */
__u64 me_moved_len; /* moved length, are we completely done? */
__u64 me_new_offset; /* Resulting physical location */
__u32 me_reserved[2]; /* reserved for futhure */
};
-------------------------------------------------------------------------------
Following are some interesting data gathered from simple tests:
1. Performance improvement gained on I/O reads:
-------------------------------------------------------------------------------
* Before defragmentation *
[root at ocfs2-box4 ~]# sync
[root at ocfs2-box4 ~]# echo 3>/proc/sys/vm/drop_caches
[root at ocfs2-box4 ~]# time dd if=/storage/testfile-1 of=/dev/null
640000+0 records in
640000+0 records out
327680000 bytes (328 MB) copied, 19.9351 s, 16.4 MB/s
real 0m19.954s
user 0m0.246s
sys 0m1.111s
* Do defragmentation *
[root at ocfs2-box4 defrag]# ./defrag -s 0 -l 293601280 -t 3145728 /storage/testfile-1
* After defragmentation *
[root at ocfs2-box4 ~]# sync
[root at ocfs2-box4 ~]# echo 3>/proc/sys/vm/drop_caches
[root at ocfs2-box4 ~]# time dd if=/storage/testfile-1 of=/dev/null
640000+0 records in
640000+0 records out
327680000 bytes (328 MB) copied, 6.79885 s, 48.2 MB/s
real 0m6.969s
user 0m0.209s
sys 0m1.063s
-------------------------------------------------------------------------------
2. Extent tree layout via debugfs.ocfs2:
-------------------------------------------------------------------------------
* Before defragmentation *
Tree Depth: 1 Count: 243 Next Free Rec: 8
## Offset Clusters Block#
0 0 1173 86561
1 1173 1173 84527
2 2346 1151 81468
3 3497 1173 76362
4 4670 1173 74328
5 5843 1172 66150
6 7015 1460 70260
7 8475 662 87680
SubAlloc Bit: 1 SubAlloc Slot: 0
Blknum: 86561 Next Leaf: 84527
CRC32: abf06a6b ECC: 44bc
Tree Depth: 0 Count: 252 Next Free Rec: 252
## Offset Clusters Block# Flags
0 1 16 516104 0x0
1 17 1 554632 0x0
2 18 7 560144 0x0
3 25 1 565960 0x0
4 26 1 572632 0x
...
/* around 1700 extent records were hidden there */
...
138 9131 1 258968 0x0
139 9132 1 259568 0x0
140 9133 1 260168 0x0
141 9134 1 260768 0x0
142 9135 1 261368 0x0
143 9136 1 261968 0x0
* After defragmentation *
Tree Depth: 1 Count: 243 Next Free Rec: 1
## Offset Clusters Block#
0 0 9137 66081
SubAlloc Bit: 1 SubAlloc Slot: 0
Blknum: 66081 Next Leaf: 0
CRC32: 22897d34 ECC: 0619
Tree Depth: 0 Count: 252 Next Free Rec: 6
## Offset Clusters Block# Flags
0 1 1600 4412936 0x0
1 1601 1595 20669448 0x0
2 3196 1600 9358856 0x0
3 4796 1404 14516232 0x0
4 6200 1600 21627400 0x0
5 7800 1337 7483400 0x0
-------------------------------------------------------------------------------
TO-DO:
1. Adding refcount/xattr support.
2. Free space defragmentation.
Go to http://oss.oracle.com/osswiki/OCFS2/DesignDocs/OnlineDefrag for more details.
Tristan.
next reply other threads:[~2011-03-18 6:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-18 6:35 Tristan Ye [this message]
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 01/16] Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 02/16] Ocfs2/refcounttree: Publicate couple of funcs from refcounttree.c Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 03/16] Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2 Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 04/16] Ocfs2/move_extents: Add basic framework and source files for extent moving Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 05/16] Ocfs2/move_extents: lock allocators and reserve metadata blocks and data clusters for extents moving Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 06/16] Ocfs2/move_extents: move a range of extent Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 07/16] Ocfs2/move_extents: defrag " Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 08/16] Ocfs2/move_extents: find the victim alloc group, where the given #blk fits Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 09/16] Ocfs2/move_extents: helper to validate and adjust moving goal Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 10/16] Ocfs2/move_extents: helper to probe a proper region to move in an alloc group Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 11/16] Ocfs2/move_extents: helpers to update the group descriptor and global bitmap inode Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 12/16] Ocfs2/move_extents: move entire/partial extent Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 13/16] Ocfs2/move_extents: helper to calculate the defraging length in one run Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 14/16] Ocfs2/move_extents: move/defrag extents within a certain range Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 15/16] Ocfs2/move_extents: Let defrag handle partial extent moving Tristan Ye
2011-03-18 6:35 ` [Ocfs2-devel] [PATCH 16/16] Ocfs2/move_extents: Set several trivial constraints for threshold Tristan Ye
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1300430143-23909-1-git-send-email-tristan.ye@oracle.com \
--to=tristan.ye@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).