* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement
@ 2008-11-28 6:49 Tao Ma
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Tao Ma @ 2008-11-28 6:49 UTC (permalink / raw)
To: ocfs2-devel
Hi all,
In ocfs2, when we create a fresh file system and create inodes in it,
they are contiguous and good for readdir+stat. While if we delete all
the inodes and created again, the new inodes will get spread out and
that isn't what we need. The core problem here is that the inode block
search looks for the "emptiest" inode group to allocate from. So if an
inode alloc file has many equally (or almost equally) empty groups, new
inodes will tend to get spread out amongst them, which in turn can put
them all over the disk. This is undesirable because directory operations
on conceptually "nearby" inodes force a large number of seeks. For more
details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.(I
have modified it a little, Mark, if you are interested, please look at
it. They are underlined.)
So this patch set try to fix this problem.
patch 1: Optimize inode allocation by remembering last group.
We add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
patch 2: let the Inode group allocs use the global bitmap directly.
patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
used allocation group so that we can make inode groups contiguous enough.
The patch set are based on Mark's merge_window, so they are easy to push.
Regards,
Tao
^ permalink raw reply [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.
2008-11-28 6:49 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Tao Ma
@ 2008-11-27 22:58 ` Tao Ma
2009-01-06 19:02 ` Mark Fasheh
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap Tao Ma
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Tao Ma @ 2008-11-27 22:58 UTC (permalink / raw)
To: ocfs2-devel
In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.
The good thing is that in ocfs2_alloc_context, there is a field named
ac_last_group which will record the last group we allocate from. So
we can only pass the right group to it and the following allocation
will do as what we expect.
So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/inode.c | 2 ++
fs/ocfs2/inode.h | 4 ++++
fs/ocfs2/namei.c | 4 ++--
fs/ocfs2/suballoc.c | 21 +++++++++++++++++++++
fs/ocfs2/suballoc.h | 2 ++
5 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 288512c..c3463c1 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -350,6 +350,8 @@ void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
ocfs2_set_inode_flags(inode);
+ OCFS2_I(inode)->ip_last_used_slot = 0;
+ OCFS2_I(inode)->ip_last_used_group = 0;
mlog_exit_void();
}
diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index eb3c302..e1978ac 100644
--- a/fs/ocfs2/inode.h
+++ b/fs/ocfs2/inode.h
@@ -72,6 +72,10 @@ struct ocfs2_inode_info
struct inode vfs_inode;
struct jbd2_inode ip_jinode;
+
+ /* Only valid if the inode is the dir. */
+ u32 ip_last_used_slot;
+ u64 ip_last_used_group;
};
/*
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 02c8026..a601dd5 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -469,8 +469,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
*new_fe_bh = NULL;
- status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit,
- &fe_blkno);
+ status = ocfs2_claim_new_inode(osb, handle, dir, parent_fe_bh,
+ inode_ac, &suballoc_bit, &fe_blkno);
if (status < 0) {
mlog_errno(status);
goto leave;
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 226fe21..f75782f 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1587,6 +1587,8 @@ bail:
int ocfs2_claim_new_inode(struct ocfs2_super *osb,
handle_t *handle,
+ struct inode *dir,
+ struct buffer_head *parent_fe_bh,
struct ocfs2_alloc_context *ac,
u16 *suballoc_bit,
u64 *fe_blkno)
@@ -1594,6 +1596,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
int status;
unsigned int num_bits;
u64 bg_blkno;
+ struct ocfs2_dinode *parent_fe =
+ (struct ocfs2_dinode *)parent_fe_bh->b_data;
mlog_entry_void();
@@ -1602,6 +1606,21 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
BUG_ON(ac->ac_bits_wanted != 1);
BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE);
+ /*
+ * Try to allocate inodes from some specific group.
+ *
+ * If the parent dir has recorded the last group used in allocation,
+ * cool, use it. Otherwise if we try to allocate new inode from the
+ * same slot the parent dir belongs to, use the same chunk.
+ */
+ if (OCFS2_I(dir)->ip_last_used_group &&
+ OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot)
+ ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group;
+ else if (le16_to_cpu(parent_fe->i_suballoc_slot) ==
+ ac->ac_alloc_slot)
+ ac->ac_last_group = le64_to_cpu(parent_fe->i_blkno) -
+ le16_to_cpu(parent_fe->i_suballoc_bit);
+
status = ocfs2_claim_suballoc_bits(osb,
ac,
handle,
@@ -1620,6 +1639,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
*fe_blkno = bg_blkno + (u64) (*suballoc_bit);
ac->ac_bits_given++;
+ OCFS2_I(dir)->ip_last_used_group = ac->ac_last_group;
+ OCFS2_I(dir)->ip_last_used_slot = ac->ac_alloc_slot;
status = 0;
bail:
mlog_exit(status);
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index e3c13c7..ea85a4c 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -88,6 +88,8 @@ int ocfs2_claim_metadata(struct ocfs2_super *osb,
u64 *blkno_start);
int ocfs2_claim_new_inode(struct ocfs2_super *osb,
handle_t *handle,
+ struct inode *dir,
+ struct buffer_head *parent_fe_bh,
struct ocfs2_alloc_context *ac,
u16 *suballoc_bit,
u64 *fe_blkno);
--
1.5.4.GIT
^ permalink raw reply related [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
@ 2009-01-06 19:02 ` Mark Fasheh
2009-01-07 0:29 ` Tao Ma
0 siblings, 1 reply; 14+ messages in thread
From: Mark Fasheh @ 2009-01-06 19:02 UTC (permalink / raw)
To: ocfs2-devel
On Fri, Nov 28, 2008 at 06:58:43AM +0800, Tao Ma wrote:
> In ocfs2, the inode block search looks for the "emptiest" inode
> group to allocate from. So if an inode alloc file has many equally
> (or almost equally) empty groups, new inodes will tend to get
> spread out amongst them, which in turn can put them all over the
> disk. This is undesirable because directory operations on conceptually
> "nearby" inodes force a large number of seeks.
>
> The good thing is that in ocfs2_alloc_context, there is a field named
> ac_last_group which will record the last group we allocate from. So
> we can only pass the right group to it and the following allocation
> will do as what we expect.
>
> So we add ip_last_used_group in core directory inodes which records
> the last used allocation group. Another field named ip_last_used_slot
> is also added in case inode stealing happens. When claiming new inode,
> we passed in directory's inode so that the allocation can use this
> information.
> For more details, please see
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
>
> Signed-off-by: Tao Ma <tao.ma@oracle.com>
> ---
> fs/ocfs2/inode.c | 2 ++
> fs/ocfs2/inode.h | 4 ++++
> fs/ocfs2/namei.c | 4 ++--
> fs/ocfs2/suballoc.c | 21 +++++++++++++++++++++
> fs/ocfs2/suballoc.h | 2 ++
> 5 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 288512c..c3463c1 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -350,6 +350,8 @@ void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
>
> ocfs2_set_inode_flags(inode);
>
> + OCFS2_I(inode)->ip_last_used_slot = 0;
> + OCFS2_I(inode)->ip_last_used_group = 0;
> mlog_exit_void();
> }
>
> diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
> index eb3c302..e1978ac 100644
> --- a/fs/ocfs2/inode.h
> +++ b/fs/ocfs2/inode.h
> @@ -72,6 +72,10 @@ struct ocfs2_inode_info
>
> struct inode vfs_inode;
> struct jbd2_inode ip_jinode;
> +
> + /* Only valid if the inode is the dir. */
> + u32 ip_last_used_slot;
> + u64 ip_last_used_group;
> };
>
> /*
> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index 02c8026..a601dd5 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -469,8 +469,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
>
> *new_fe_bh = NULL;
>
> - status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit,
> - &fe_blkno);
> + status = ocfs2_claim_new_inode(osb, handle, dir, parent_fe_bh,
> + inode_ac, &suballoc_bit, &fe_blkno);
> if (status < 0) {
> mlog_errno(status);
> goto leave;
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 226fe21..f75782f 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -1587,6 +1587,8 @@ bail:
>
> int ocfs2_claim_new_inode(struct ocfs2_super *osb,
> handle_t *handle,
> + struct inode *dir,
> + struct buffer_head *parent_fe_bh,
> struct ocfs2_alloc_context *ac,
> u16 *suballoc_bit,
> u64 *fe_blkno)
> @@ -1594,6 +1596,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
> int status;
> unsigned int num_bits;
> u64 bg_blkno;
> + struct ocfs2_dinode *parent_fe =
> + (struct ocfs2_dinode *)parent_fe_bh->b_data;
>
> mlog_entry_void();
>
> @@ -1602,6 +1606,21 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
> BUG_ON(ac->ac_bits_wanted != 1);
> BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE);
>
> + /*
> + * Try to allocate inodes from some specific group.
> + *
> + * If the parent dir has recorded the last group used in allocation,
> + * cool, use it. Otherwise if we try to allocate new inode from the
> + * same slot the parent dir belongs to, use the same chunk.
> + */
> + if (OCFS2_I(dir)->ip_last_used_group &&
> + OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot)
> + ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group;
> + else if (le16_to_cpu(parent_fe->i_suballoc_slot) ==
> + ac->ac_alloc_slot)
> + ac->ac_last_group = le64_to_cpu(parent_fe->i_blkno) -
> + le16_to_cpu(parent_fe->i_suballoc_bit);
You should use ocfs2_which_suballoc_group() here, instead of open coding the
math to get ac_last_group.
Also, would it be possible for us to put this block in it's own function so
that it's easier to play with the logic in the future?
One last thing - can you add to the comment:
*
* We are very careful here to avoid the mistake of setting ac_last_group to
* a group descriptor from a different (unlocked) slot.
*/
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.
2009-01-06 19:02 ` Mark Fasheh
@ 2009-01-07 0:29 ` Tao Ma
0 siblings, 0 replies; 14+ messages in thread
From: Tao Ma @ 2009-01-07 0:29 UTC (permalink / raw)
To: ocfs2-devel
Mark Fasheh wrote:
>> @@ -1602,6 +1606,21 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
>> BUG_ON(ac->ac_bits_wanted != 1);
>> BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE);
>>
>> + /*
>> + * Try to allocate inodes from some specific group.
>> + *
>> + * If the parent dir has recorded the last group used in allocation,
>> + * cool, use it. Otherwise if we try to allocate new inode from the
>> + * same slot the parent dir belongs to, use the same chunk.
>> + */
>> + if (OCFS2_I(dir)->ip_last_used_group &&
>> + OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot)
>> + ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group;
>> + else if (le16_to_cpu(parent_fe->i_suballoc_slot) ==
>> + ac->ac_alloc_slot)
>> + ac->ac_last_group = le64_to_cpu(parent_fe->i_blkno) -
>> + le16_to_cpu(parent_fe->i_suballoc_bit);
>
> You should use ocfs2_which_suballoc_group() here, instead of open coding the
> math to get ac_last_group.
yeah, you are right. It looks that I am still not familiar with all
those helpful functions. ;)
>
> Also, would it be possible for us to put this block in it's own function so
> that it's easier to play with the logic in the future?
No problem. I will wrap it with a function.
>
>
> One last thing - can you add to the comment:
>
> *
> * We are very careful here to avoid the mistake of setting ac_last_group to
> * a group descriptor from a different (unlocked) slot.
> */
OK, thanks.
Regards,
Tao
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap.
2008-11-28 6:49 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Tao Ma
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
@ 2008-11-27 22:58 ` Tao Ma
2009-01-06 19:47 ` Mark Fasheh
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
2009-01-06 21:41 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Mark Fasheh
3 siblings, 1 reply; 14+ messages in thread
From: Tao Ma @ 2008-11-27 22:58 UTC (permalink / raw)
To: ocfs2-devel
Inode groups used to be allocated from local alloc file, but since
we want all inodes to be contiguous enough, we will try to allocate
them directly from global_bitmap.
Add a new flag named "ALLOC_NEW_GROUP_FROM_GLOBAL", if we pass this
flag into allocation, don't try local alloc.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/suballoc.c | 33 +++++++++++++++++++++------------
1 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index f75782f..98e32b2 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -47,7 +47,8 @@
#include "buffer_head_io.h"
#define NOT_ALLOC_NEW_GROUP 0
-#define ALLOC_NEW_GROUP 1
+#define ALLOC_NEW_GROUP 0x1
+#define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2
#define OCFS2_MAX_INODES_TO_STEAL 1024
@@ -63,7 +64,8 @@ static int ocfs2_block_group_fill(handle_t *handle,
static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
struct inode *alloc_inode,
struct buffer_head *bh,
- u64 max_block);
+ u64 max_block,
+ int flags);
static int ocfs2_cluster_group_search(struct inode *inode,
struct buffer_head *group_bh,
@@ -115,7 +117,8 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode,
u16 *bg_bit_off);
static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
u32 bits_wanted, u64 max_block,
- struct ocfs2_alloc_context **ac);
+ struct ocfs2_alloc_context **ac,
+ int flags);
void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac)
{
@@ -370,7 +373,8 @@ static inline u16 ocfs2_find_smallest_chain(struct ocfs2_chain_list *cl)
static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
struct inode *alloc_inode,
struct buffer_head *bh,
- u64 max_block)
+ u64 max_block,
+ int flags)
{
int status, credits;
struct ocfs2_dinode *fe = (struct ocfs2_dinode *) bh->b_data;
@@ -390,7 +394,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
cl = &fe->id2.i_chain;
status = ocfs2_reserve_clusters_with_limit(osb,
le16_to_cpu(cl->cl_cpg),
- max_block, &ac);
+ max_block, &ac, flags);
if (status < 0) {
if (status != -ENOSPC)
mlog_errno(status);
@@ -498,7 +502,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
struct ocfs2_alloc_context *ac,
int type,
u32 slot,
- int alloc_new_group)
+ int flags)
{
int status;
u32 bits_wanted = ac->ac_bits_wanted;
@@ -554,7 +558,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
goto bail;
}
- if (alloc_new_group != ALLOC_NEW_GROUP) {
+ if (!(flags & ALLOC_NEW_GROUP)) {
mlog(0, "Alloc File %u Full: wanted=%u, free_bits=%u, "
"and we don't alloc a new group for it.\n",
slot, bits_wanted, free_bits);
@@ -563,7 +567,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
}
status = ocfs2_block_group_alloc(osb, alloc_inode, bh,
- ac->ac_max_block);
+ ac->ac_max_block, flags);
if (status < 0) {
if (status != -ENOSPC)
mlog_errno(status);
@@ -707,7 +711,9 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
atomic_set(&osb->s_num_inodes_stolen, 0);
status = ocfs2_reserve_suballoc_bits(osb, *ac,
INODE_ALLOC_SYSTEM_INODE,
- osb->slot_num, ALLOC_NEW_GROUP);
+ osb->slot_num,
+ ALLOC_NEW_GROUP |
+ ALLOC_NEW_GROUP_FROM_GLOBAL);
if (status >= 0) {
status = 0;
@@ -773,7 +779,8 @@ bail:
* things a bit. */
static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
u32 bits_wanted, u64 max_block,
- struct ocfs2_alloc_context **ac)
+ struct ocfs2_alloc_context **ac,
+ int flags)
{
int status;
@@ -790,7 +797,8 @@ static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
(*ac)->ac_max_block = max_block;
status = -ENOSPC;
- if (ocfs2_alloc_should_use_local(osb, bits_wanted)) {
+ if (!(flags & ALLOC_NEW_GROUP_FROM_GLOBAL) &&
+ ocfs2_alloc_should_use_local(osb, bits_wanted)) {
status = ocfs2_reserve_local_alloc_bits(osb,
bits_wanted,
*ac);
@@ -828,7 +836,8 @@ int ocfs2_reserve_clusters(struct ocfs2_super *osb,
u32 bits_wanted,
struct ocfs2_alloc_context **ac)
{
- return ocfs2_reserve_clusters_with_limit(osb, bits_wanted, 0, ac);
+ return ocfs2_reserve_clusters_with_limit(osb, bits_wanted, 0, ac,
+ ALLOC_NEW_GROUP);
}
/*
--
1.5.4.GIT
^ permalink raw reply related [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap.
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap Tao Ma
@ 2009-01-06 19:47 ` Mark Fasheh
0 siblings, 0 replies; 14+ messages in thread
From: Mark Fasheh @ 2009-01-06 19:47 UTC (permalink / raw)
To: ocfs2-devel
On Fri, Nov 28, 2008 at 06:58:44AM +0800, Tao Ma wrote:
> Inode groups used to be allocated from local alloc file, but since
> we want all inodes to be contiguous enough, we will try to allocate
> them directly from global_bitmap.
>
> Add a new flag named "ALLOC_NEW_GROUP_FROM_GLOBAL", if we pass this
> flag into allocation, don't try local alloc.
>
> Signed-off-by: Tao Ma <tao.ma@oracle.com>
> ---
> fs/ocfs2/suballoc.c | 33 +++++++++++++++++++++------------
> 1 files changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index f75782f..98e32b2 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -47,7 +47,8 @@
> #include "buffer_head_io.h"
>
> #define NOT_ALLOC_NEW_GROUP 0
> -#define ALLOC_NEW_GROUP 1
> +#define ALLOC_NEW_GROUP 0x1
> +#define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2
How about "ALLOC_GROUPS_FROM_GLOBAL"? That makes it clear we're only
modifying *where* new groups come from.
> #define OCFS2_MAX_INODES_TO_STEAL 1024
>
> @@ -63,7 +64,8 @@ static int ocfs2_block_group_fill(handle_t *handle,
> static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
> struct inode *alloc_inode,
> struct buffer_head *bh,
> - u64 max_block);
> + u64 max_block,
> + int flags);
>
> static int ocfs2_cluster_group_search(struct inode *inode,
> struct buffer_head *group_bh,
> @@ -115,7 +117,8 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode,
> u16 *bg_bit_off);
> static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
> u32 bits_wanted, u64 max_block,
> - struct ocfs2_alloc_context **ac);
> + struct ocfs2_alloc_context **ac,
> + int flags);
Can you put the 'int flags' before the **ac return value here please?
I'm saddened that we're adding a flags field to so many functions, but after
looking through the code, I can't really come up with a better solution.
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
2008-11-28 6:49 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Tao Ma
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap Tao Ma
@ 2008-11-27 22:58 ` Tao Ma
2009-01-06 21:23 ` Mark Fasheh
2009-01-06 21:41 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Mark Fasheh
3 siblings, 1 reply; 14+ messages in thread
From: Tao Ma @ 2008-11-27 22:58 UTC (permalink / raw)
To: ocfs2-devel
In ocfs2, the block group search looks for the "emptiest" group
to allocate from. So if the allocator has many equally(or almost
equally) empty groups, new block group will tend to get spread
out amongst them.
We add osb_last_alloc_group in ocfs2_super to record the last used
allocation group so that next time we can allocate inode group
directly from it.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/ocfs2.h | 3 +++
fs/ocfs2/suballoc.c | 18 +++++++++++++++++-
fs/ocfs2/super.c | 1 +
3 files changed, 21 insertions(+), 1 deletions(-)
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 5c77798..a99b53e 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -335,6 +335,9 @@ struct ocfs2_super
struct ocfs2_node_map osb_recovering_orphan_dirs;
unsigned int *osb_orphan_wipes;
wait_queue_head_t osb_wipe_event;
+
+ /* the group we used to allocate inodes. */
+ u64 osb_last_alloc_group;
};
#define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info)
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 98e32b2..3ab1135 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -49,6 +49,7 @@
#define NOT_ALLOC_NEW_GROUP 0
#define ALLOC_NEW_GROUP 0x1
#define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2
+#define ALLOC_USE_RECORD_GROUP 0x4
#define OCFS2_MAX_INODES_TO_STEAL 1024
@@ -411,6 +412,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
goto bail;
}
+ if (flags & ALLOC_USE_RECORD_GROUP && osb->osb_last_alloc_group) {
+ mlog(0, "use old allocation group %llu\n",
+ (unsigned long long)osb->osb_last_alloc_group);
+ ac->ac_last_group = osb->osb_last_alloc_group;
+ }
status = ocfs2_claim_clusters(osb,
handle,
ac,
@@ -485,6 +491,15 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
alloc_inode->i_blocks = ocfs2_inode_sector_count(alloc_inode);
status = 0;
+
+ if (flags & ALLOC_USE_RECORD_GROUP) {
+ spin_lock(&osb->osb_lock);
+ osb->osb_last_alloc_group = ac->ac_last_group;
+ spin_unlock(&osb->osb_lock);
+ mlog(0, "after reservation, new allocation group is "
+ "%llu\n", (unsigned long long)osb->osb_last_alloc_group);
+ }
+
bail:
if (handle)
ocfs2_commit_trans(osb, handle);
@@ -713,7 +728,8 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
INODE_ALLOC_SYSTEM_INODE,
osb->slot_num,
ALLOC_NEW_GROUP |
- ALLOC_NEW_GROUP_FROM_GLOBAL);
+ ALLOC_NEW_GROUP_FROM_GLOBAL |
+ ALLOC_USE_RECORD_GROUP);
if (status >= 0) {
status = 0;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index bc43138..3593759 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -843,6 +843,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
osb->osb_commit_interval = parsed_options.commit_interval;
osb->local_alloc_default_bits = ocfs2_megabytes_to_clusters(sb, parsed_options.localalloc_opt);
osb->local_alloc_bits = osb->local_alloc_default_bits;
+ osb->osb_last_alloc_group = 0;
if (osb->s_mount_opt & OCFS2_MOUNT_USRQUOTA &&
!OCFS2_HAS_RO_COMPAT_FEATURE(sb,
OCFS2_FEATURE_RO_COMPAT_USRQUOTA)) {
--
1.5.4.GIT
^ permalink raw reply related [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
@ 2009-01-06 21:23 ` Mark Fasheh
2009-01-07 0:49 ` Tao Ma
0 siblings, 1 reply; 14+ messages in thread
From: Mark Fasheh @ 2009-01-06 21:23 UTC (permalink / raw)
To: ocfs2-devel
On Fri, Nov 28, 2008 at 06:58:45AM +0800, Tao Ma wrote:
> In ocfs2, the block group search looks for the "emptiest" group
> to allocate from. So if the allocator has many equally(or almost
> equally) empty groups, new block group will tend to get spread
> out amongst them.
>
> We add osb_last_alloc_group in ocfs2_super to record the last used
> allocation group so that next time we can allocate inode group
> directly from it.
> For more details, please see
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
>
> Signed-off-by: Tao Ma <tao.ma@oracle.com>
> ---
> fs/ocfs2/ocfs2.h | 3 +++
> fs/ocfs2/suballoc.c | 18 +++++++++++++++++-
> fs/ocfs2/super.c | 1 +
> 3 files changed, 21 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
> index 5c77798..a99b53e 100644
> --- a/fs/ocfs2/ocfs2.h
> +++ b/fs/ocfs2/ocfs2.h
> @@ -335,6 +335,9 @@ struct ocfs2_super
> struct ocfs2_node_map osb_recovering_orphan_dirs;
> unsigned int *osb_orphan_wipes;
> wait_queue_head_t osb_wipe_event;
> +
> + /* the group we used to allocate inodes. */
> + u64 osb_last_alloc_group;
Can you give this a name which is more specific to the inode allocators?
That way we can just add another u64 later for the metadata allocators,
should we decide to do the same for them.
> };
>
> #define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info)
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 98e32b2..3ab1135 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -49,6 +49,7 @@
> #define NOT_ALLOC_NEW_GROUP 0
> #define ALLOC_NEW_GROUP 0x1
> #define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2
> +#define ALLOC_USE_RECORD_GROUP 0x4
>
> #define OCFS2_MAX_INODES_TO_STEAL 1024
>
> @@ -411,6 +412,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
> goto bail;
> }
>
> + if (flags & ALLOC_USE_RECORD_GROUP && osb->osb_last_alloc_group) {
> + mlog(0, "use old allocation group %llu\n",
> + (unsigned long long)osb->osb_last_alloc_group);
> + ac->ac_last_group = osb->osb_last_alloc_group;
> + }
> status = ocfs2_claim_clusters(osb,
> handle,
> ac,
> @@ -485,6 +491,15 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
> alloc_inode->i_blocks = ocfs2_inode_sector_count(alloc_inode);
>
> status = 0;
> +
> + if (flags & ALLOC_USE_RECORD_GROUP) {
> + spin_lock(&osb->osb_lock);
> + osb->osb_last_alloc_group = ac->ac_last_group;
> + spin_unlock(&osb->osb_lock);
> + mlog(0, "after reservation, new allocation group is "
> + "%llu\n", (unsigned long long)osb->osb_last_alloc_group);
> + }
> +
How about we instead add a u64 * argument to ocfs2_block_group_alloc() and
ocfs2_reserve_suballoc_bits(). Inside ocfs2_block_group_alloc, we'd test for
it to be non-null before using it. That way, any other allocator code which
wants to use this mechanism only has to modify a function parameter. If
you'd rather test for the flag instead of non-null, that's ok too so long as
the end result is the same - new users just have to modify their parameters to
ocfs2_reserve_suballoc_bits().
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index bc43138..3593759 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -843,6 +843,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> osb->osb_commit_interval = parsed_options.commit_interval;
> osb->local_alloc_default_bits = ocfs2_megabytes_to_clusters(sb, parsed_options.localalloc_opt);
> osb->local_alloc_bits = osb->local_alloc_default_bits;
> + osb->osb_last_alloc_group = 0;
Won't this already be zero'd for us by ocfs2_initialize_super()?
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
2009-01-06 21:23 ` Mark Fasheh
@ 2009-01-07 0:49 ` Tao Ma
2009-01-07 1:03 ` Mark Fasheh
0 siblings, 1 reply; 14+ messages in thread
From: Tao Ma @ 2009-01-07 0:49 UTC (permalink / raw)
To: ocfs2-devel
Mark Fasheh wrote:
> On Fri, Nov 28, 2008 at 06:58:45AM +0800, Tao Ma wrote:
>> In ocfs2, the block group search looks for the "emptiest" group
>> to allocate from. So if the allocator has many equally(or almost
>> equally) empty groups, new block group will tend to get spread
>> out amongst them.
>>
>> We add osb_last_alloc_group in ocfs2_super to record the last used
>> allocation group so that next time we can allocate inode group
>> directly from it.
>> For more details, please see
>> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
>>
>> Signed-off-by: Tao Ma <tao.ma@oracle.com>
>> ---
>> fs/ocfs2/ocfs2.h | 3 +++
>> fs/ocfs2/suballoc.c | 18 +++++++++++++++++-
>> fs/ocfs2/super.c | 1 +
>> 3 files changed, 21 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
>> index 5c77798..a99b53e 100644
>> --- a/fs/ocfs2/ocfs2.h
>> +++ b/fs/ocfs2/ocfs2.h
>> @@ -335,6 +335,9 @@ struct ocfs2_super
>> struct ocfs2_node_map osb_recovering_orphan_dirs;
>> unsigned int *osb_orphan_wipes;
>> wait_queue_head_t osb_wipe_event;
>> +
>> + /* the group we used to allocate inodes. */
>> + u64 osb_last_alloc_group;
>
> Can you give this a name which is more specific to the inode allocators?
> That way we can just add another u64 later for the metadata allocators,
> should we decide to do the same for them.
How about osb_inode_alloc_group?
>
>
>> };
>>
>> #define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info)
>> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
>> index 98e32b2..3ab1135 100644
>> --- a/fs/ocfs2/suballoc.c
>> +++ b/fs/ocfs2/suballoc.c
>> @@ -49,6 +49,7 @@
>> #define NOT_ALLOC_NEW_GROUP 0
>> #define ALLOC_NEW_GROUP 0x1
>> #define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2
>> +#define ALLOC_USE_RECORD_GROUP 0x4
>>
>> #define OCFS2_MAX_INODES_TO_STEAL 1024
>>
>> @@ -411,6 +412,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
>> goto bail;
>> }
>>
>> + if (flags & ALLOC_USE_RECORD_GROUP && osb->osb_last_alloc_group) {
>> + mlog(0, "use old allocation group %llu\n",
>> + (unsigned long long)osb->osb_last_alloc_group);
>> + ac->ac_last_group = osb->osb_last_alloc_group;
>> + }
>> status = ocfs2_claim_clusters(osb,
>> handle,
>> ac,
>> @@ -485,6 +491,15 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
>> alloc_inode->i_blocks = ocfs2_inode_sector_count(alloc_inode);
>>
>> status = 0;
>> +
>> + if (flags & ALLOC_USE_RECORD_GROUP) {
>> + spin_lock(&osb->osb_lock);
>> + osb->osb_last_alloc_group = ac->ac_last_group;
>> + spin_unlock(&osb->osb_lock);
>> + mlog(0, "after reservation, new allocation group is "
>> + "%llu\n", (unsigned long long)osb->osb_last_alloc_group);
>> + }
>> +
>
> How about we instead add a u64 * argument to ocfs2_block_group_alloc() and
> ocfs2_reserve_suballoc_bits(). Inside ocfs2_block_group_alloc, we'd test for
> it to be non-null before using it. That way, any other allocator code which
> wants to use this mechanism only has to modify a function parameter. If
> you'd rather test for the flag instead of non-null, that's ok too so long as
> the end result is the same - new users just have to modify their parameters to
> ocfs2_reserve_suballoc_bits().
yeah, you suggestion make ocfs2_reserve_suballoc_bits more generic. thanks.
>
>
>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>> index bc43138..3593759 100644
>> --- a/fs/ocfs2/super.c
>> +++ b/fs/ocfs2/super.c
>> @@ -843,6 +843,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
>> osb->osb_commit_interval = parsed_options.commit_interval;
>> osb->local_alloc_default_bits = ocfs2_megabytes_to_clusters(sb, parsed_options.localalloc_opt);
>> osb->local_alloc_bits = osb->local_alloc_default_bits;
>> + osb->osb_last_alloc_group = 0;
>
> Won't this already be zero'd for us by ocfs2_initialize_super()?
Yeah, thanks for it.
Regards,
Tao
^ permalink raw reply [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
2009-01-07 0:49 ` Tao Ma
@ 2009-01-07 1:03 ` Mark Fasheh
0 siblings, 0 replies; 14+ messages in thread
From: Mark Fasheh @ 2009-01-07 1:03 UTC (permalink / raw)
To: ocfs2-devel
On Wed, Jan 07, 2009 at 08:49:24AM +0800, Tao Ma wrote:
> >> struct ocfs2_node_map osb_recovering_orphan_dirs;
> >> unsigned int *osb_orphan_wipes;
> >> wait_queue_head_t osb_wipe_event;
> >>+
> >>+ /* the group we used to allocate inodes. */
> >>+ u64 osb_last_alloc_group;
> >
> >Can you give this a name which is more specific to the inode allocators?
> >That way we can just add another u64 later for the metadata allocators,
> >should we decide to do the same for them.
> How about osb_inode_alloc_group?
That works for me.
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement
2008-11-28 6:49 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Tao Ma
` (2 preceding siblings ...)
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
@ 2009-01-06 21:41 ` Mark Fasheh
2009-01-07 0:27 ` Tao Ma
3 siblings, 1 reply; 14+ messages in thread
From: Mark Fasheh @ 2009-01-06 21:41 UTC (permalink / raw)
To: ocfs2-devel
On Fri, Nov 28, 2008 at 02:49:19PM +0800, Tao Ma wrote:
> Hi all,
> In ocfs2, when we create a fresh file system and create inodes in
> it, they are contiguous and good for readdir+stat. While if we delete all
> the inodes and created again, the new inodes will get spread out and
> that isn't what we need. The core problem here is that the inode block
> search looks for the "emptiest" inode group to allocate from. So if an
> inode alloc file has many equally (or almost equally) empty groups, new
> inodes will tend to get spread out amongst them, which in turn can put
> them all over the disk. This is undesirable because directory operations
> on conceptually "nearby" inodes force a large number of seeks. For more
> details, please see
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.(I
> have modified it a little, Mark, if you are interested, please look at
> it. They are underlined.)
Your edits look fine. Thanks for updating the design doc.
> So this patch set try to fix this problem.
> patch 1: Optimize inode allocation by remembering last group.
> We add ip_last_used_group in core directory inodes which records
> the last used allocation group. Another field named ip_last_used_slot
> is also added in case inode stealing happens. When claiming new inode,
> we passed in directory's inode so that the allocation can use this
> information.
>
> patch 2: let the Inode group allocs use the global bitmap directly.
>
> patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
> used allocation group so that we can make inode groups contiguous enough.
So, the logic in your patches is correct. As you can see, most of my
comments were more about code flow or trivial cleanups. Assuming this all
works as we expect, there shouldn't be much code for you to modify before
the patches can be put in the merge_window branch.
One things though - would you mind providing a small amount of data to show
what sort of improvement (if any) we're getting from these patches? I don't
think we need anything fancy - just enough to answer the following two
questions:
- How much does this improve our inode fragmentation level?
Any test that fragments the inode space would be appropriate for this.
We could then simply express fragmentation as some value - maybe a ratio of
adjacent inodes as compared to total # of inodes, expressed as a percentage
value. It would be nice for future testing if we had a small tool to
calculate this (maybe by libocfs2, or by just making readdir calls and
looking at inode number),
- Does the 2nd patch impact overall inode creation times in a cluster, since
we're now using the cluster bitmap instead of local alloc.
I'm thinking any of our parallel inode creation tests would be fine for
this.
Thanks,
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement
2009-01-06 21:41 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Mark Fasheh
@ 2009-01-07 0:27 ` Tao Ma
0 siblings, 0 replies; 14+ messages in thread
From: Tao Ma @ 2009-01-07 0:27 UTC (permalink / raw)
To: ocfs2-devel
Hi Mark,
Thanks for the review.
Mark Fasheh wrote:
> On Fri, Nov 28, 2008 at 02:49:19PM +0800, Tao Ma wrote:
>> Hi all,
>> In ocfs2, when we create a fresh file system and create inodes in
>> it, they are contiguous and good for readdir+stat. While if we delete all
>> the inodes and created again, the new inodes will get spread out and
>> that isn't what we need. The core problem here is that the inode block
>> search looks for the "emptiest" inode group to allocate from. So if an
>> inode alloc file has many equally (or almost equally) empty groups, new
>> inodes will tend to get spread out amongst them, which in turn can put
>> them all over the disk. This is undesirable because directory operations
>> on conceptually "nearby" inodes force a large number of seeks. For more
>> details, please see
>> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.(I
>> have modified it a little, Mark, if you are interested, please look at
>> it. They are underlined.)
>
> Your edits look fine. Thanks for updating the design doc.
cool.
>
>
>> So this patch set try to fix this problem.
>> patch 1: Optimize inode allocation by remembering last group.
>> We add ip_last_used_group in core directory inodes which records
>> the last used allocation group. Another field named ip_last_used_slot
>> is also added in case inode stealing happens. When claiming new inode,
>> we passed in directory's inode so that the allocation can use this
>> information.
>>
>> patch 2: let the Inode group allocs use the global bitmap directly.
>>
>> patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
>> used allocation group so that we can make inode groups contiguous enough.
>
> So, the logic in your patches is correct. As you can see, most of my
> comments were more about code flow or trivial cleanups. Assuming this all
> works as we expect, there shouldn't be much code for you to modify before
> the patches can be put in the merge_window branch.
>
>
> One things though - would you mind providing a small amount of data to show
> what sort of improvement (if any) we're getting from these patches? I don't
> think we need anything fancy - just enough to answer the following two
> questions:
>
> - How much does this improve our inode fragmentation level?
Actually I have some statistics and the result is cool. ;) I will
attach it in the next round of patches.
>
> Any test that fragments the inode space would be appropriate for this.
>
> We could then simply express fragmentation as some value - maybe a ratio of
> adjacent inodes as compared to total # of inodes, expressed as a percentage
> value. It would be nice for future testing if we had a small tool to
> calculate this (maybe by libocfs2, or by just making readdir calls and
> looking at inode number),
>
>
> - Does the 2nd patch impact overall inode creation times in a cluster, since
> we're now using the cluster bitmap instead of local alloc.
No statistics here since I originally think there should be not much
difference. But I will test it and attach the result. Thanks.
Regards,
Tao
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
@ 2009-01-15 21:58 Tao Ma
2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
0 siblings, 1 reply; 14+ messages in thread
From: Tao Ma @ 2009-01-15 21:58 UTC (permalink / raw)
To: ocfs2-devel
Changelog from V1 to V2:
1. Modify some codes according to Mark's advice.
2. Attach some test statistics in the commit log of patch 3 and in
this e-mail also. See below.
Hi all,
In ocfs2, when we create a fresh file system and create inodes in it,
they are contiguous and good for readdir+stat. While if we delete all
the inodes and created again, the new inodes will get spread out and
that isn't what we need. The core problem here is that the inode block
search looks for the "emptiest" inode group to allocate from. So if an
inode alloc file has many equally (or almost equally) empty groups, new
inodes will tend to get spread out amongst them, which in turn can put
them all over the disk. This is undesirable because directory operations
on conceptually "nearby" inodes force a large number of seeks. For more
details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
So this patch set try to fix this problem.
patch 1: Optimize inode allocation by remembering last group.
We add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
patch 2: let the Inode group allocs use the global bitmap directly.
patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
used allocation group so that we can make inode groups contiguous enough.
I have done some basic test and the results are cool.
1. single node test:
first column is the result without inode allocation patches, and the
second one with inode allocation patched enabled. You see we have
great improvement with the second "ls -lR".
echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
real 0m20.548s 0m20.106s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null
real 0m13.965s 0m13.766s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time rm /mnt/ocfs2/linux-2.6.28/ -rf
real 0m13.198s 0m13.091s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
real 0m23.022s 0m21.360s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null
real 2m45.189s 0m15.019s
yes, that is it. ;) I don't know we can improve so much when I start up.
2. Tested with 4 nodes(megabyte switch for both cross-node
communication and iscsi), with the same command sequence(using
openmpi to run the command simultaneously). Although we spend
a lot of time in cross-node communication, we still have some
performance improvement.
the 1st tar:
real 356.22s 357.70s
the 1st ls -lR:
real 187.33s 187.32s
the rm:
real 260.68s 262.42s
the 2nd tar:
real 371.92s 358.47s
the 2nd ls:
real 197.16s 188.36s
Regards,
Tao
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
@ 2009-01-15 22:00 ` Tao Ma
0 siblings, 0 replies; 14+ messages in thread
From: Tao Ma @ 2009-01-15 22:00 UTC (permalink / raw)
To: ocfs2-devel
In ocfs2, the block group search looks for the "emptiest" group
to allocate from. So if the allocator has many equally(or almost
equally) empty groups, new block group will tend to get spread
out amongst them.
So we add osb_inode_alloc_group in ocfs2_super to record the last
used inode allocation group.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
I have done some basic test and the results are cool.
1. single node test:
first column is the result without inode allocation patches, and the
second one with inode allocation patched enabled. You see we have
great improvement with the second "ls -lR".
echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
real 0m20.548s 0m20.106s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null
real 0m13.965s 0m13.766s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time rm /mnt/ocfs2/linux-2.6.28/ -rf
real 0m13.198s 0m13.091s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
real 0m23.022s 0m21.360s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null
real 2m45.189s 0m15.019s(yes, that is it. :) )
2. Tested with 4 nodes(megabyte switch for both cross-node
communication and iscsi), with the same command sequence(using
openmpi to run the command simultaneously). Although we spend
a lot of time in cross-node communication, we still have some
performance improvement.
the 1st tar:
real 356.22s 357.70s
the 1st ls -lR:
real 187.33s 187.32s
the rm:
real 260.68s 262.42s
the 2nd tar:
real 371.92s 358.47s
the 2nd ls:
real 197.16s 188.36s
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/ocfs2.h | 3 +++
fs/ocfs2/suballoc.c | 32 ++++++++++++++++++++++++++++----
2 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index ad5c24a..f0377bd 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -335,6 +335,9 @@ struct ocfs2_super
struct ocfs2_node_map osb_recovering_orphan_dirs;
unsigned int *osb_orphan_wipes;
wait_queue_head_t osb_wipe_event;
+
+ /* the group we used to allocate inodes. */
+ u64 osb_inode_alloc_group;
};
#define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info)
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index b7a065e..4c1399c 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -66,6 +66,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
struct inode *alloc_inode,
struct buffer_head *bh,
u64 max_block,
+ u64 *last_alloc_group,
int flags);
static int ocfs2_cluster_group_search(struct inode *inode,
@@ -407,6 +408,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
struct inode *alloc_inode,
struct buffer_head *bh,
u64 max_block,
+ u64 *last_alloc_group,
int flags)
{
int status, credits;
@@ -444,6 +446,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
goto bail;
}
+ if (last_alloc_group && *last_alloc_group != 0) {
+ mlog(0, "use old allocation group %llu for block group alloc\n",
+ (unsigned long long)*last_alloc_group);
+ ac->ac_last_group = *last_alloc_group;
+ }
status = ocfs2_claim_clusters(osb,
handle,
ac,
@@ -518,6 +525,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
alloc_inode->i_blocks = ocfs2_inode_sector_count(alloc_inode);
status = 0;
+
+ /* save the new last alloc group so that the caller can cache it. */
+ if (last_alloc_group)
+ *last_alloc_group = ac->ac_last_group;
+
bail:
if (handle)
ocfs2_commit_trans(osb, handle);
@@ -535,6 +547,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
struct ocfs2_alloc_context *ac,
int type,
u32 slot,
+ u64 *last_alloc_group,
int flags)
{
int status;
@@ -600,7 +613,8 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
}
status = ocfs2_block_group_alloc(osb, alloc_inode, bh,
- ac->ac_max_block, flags);
+ ac->ac_max_block,
+ last_alloc_group, flags);
if (status < 0) {
if (status != -ENOSPC)
mlog_errno(status);
@@ -644,7 +658,7 @@ int ocfs2_reserve_new_metadata_blocks(struct ocfs2_super *osb,
status = ocfs2_reserve_suballoc_bits(osb, (*ac),
EXTENT_ALLOC_SYSTEM_INODE,
- slot, ALLOC_NEW_GROUP);
+ slot, NULL, ALLOC_NEW_GROUP);
if (status < 0) {
if (status != -ENOSPC)
mlog_errno(status);
@@ -690,7 +704,8 @@ static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
status = ocfs2_reserve_suballoc_bits(osb, ac,
INODE_ALLOC_SYSTEM_INODE,
- slot, NOT_ALLOC_NEW_GROUP);
+ slot, NULL,
+ NOT_ALLOC_NEW_GROUP);
if (status >= 0) {
ocfs2_set_inode_steal_slot(osb, slot);
break;
@@ -707,6 +722,7 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
{
int status;
s16 slot = ocfs2_get_inode_steal_slot(osb);
+ u64 alloc_group;
*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
if (!(*ac)) {
@@ -742,14 +758,22 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
goto inode_steal;
atomic_set(&osb->s_num_inodes_stolen, 0);
+ alloc_group = osb->osb_inode_alloc_group;
status = ocfs2_reserve_suballoc_bits(osb, *ac,
INODE_ALLOC_SYSTEM_INODE,
osb->slot_num,
+ &alloc_group,
ALLOC_NEW_GROUP |
ALLOC_GROUPS_FROM_GLOBAL);
if (status >= 0) {
status = 0;
+ spin_lock(&osb->osb_lock);
+ osb->osb_inode_alloc_group = alloc_group;
+ spin_unlock(&osb->osb_lock);
+ mlog(0, "after reservation, new allocation group is "
+ "%llu\n", (unsigned long long)alloc_group);
+
/*
* Some inodes must be freed by us, so try to allocate
* from our own next time.
@@ -796,7 +820,7 @@ int ocfs2_reserve_cluster_bitmap_bits(struct ocfs2_super *osb,
status = ocfs2_reserve_suballoc_bits(osb, ac,
GLOBAL_BITMAP_SYSTEM_INODE,
- OCFS2_INVALID_SLOT,
+ OCFS2_INVALID_SLOT, NULL,
ALLOC_NEW_GROUP);
if (status < 0 && status != -ENOSPC) {
mlog_errno(status);
--
1.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.
@ 2009-02-24 16:53 Tao Ma
2009-02-24 16:53 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
0 siblings, 1 reply; 14+ messages in thread
From: Tao Ma @ 2009-02-24 16:53 UTC (permalink / raw)
To: ocfs2-devel
In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.
So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/inode.c | 2 ++
fs/ocfs2/inode.h | 4 ++++
fs/ocfs2/namei.c | 4 ++--
fs/ocfs2/suballoc.c | 36 ++++++++++++++++++++++++++++++++++++
fs/ocfs2/suballoc.h | 2 ++
5 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index f1f77b2..4a88bce 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -352,6 +352,8 @@ void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
ocfs2_set_inode_flags(inode);
+ OCFS2_I(inode)->ip_last_used_slot = 0;
+ OCFS2_I(inode)->ip_last_used_group = 0;
mlog_exit_void();
}
diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index eb3c302..e1978ac 100644
--- a/fs/ocfs2/inode.h
+++ b/fs/ocfs2/inode.h
@@ -72,6 +72,10 @@ struct ocfs2_inode_info
struct inode vfs_inode;
struct jbd2_inode ip_jinode;
+
+ /* Only valid if the inode is the dir. */
+ u32 ip_last_used_slot;
+ u64 ip_last_used_group;
};
/*
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 7cb55ce..9363ddd 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -485,8 +485,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
*new_fe_bh = NULL;
- status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit,
- &fe_blkno);
+ status = ocfs2_claim_new_inode(osb, handle, dir, parent_fe_bh,
+ inode_ac, &suballoc_bit, &fe_blkno);
if (status < 0) {
mlog_errno(status);
goto leave;
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index a696286..487f00c 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1618,8 +1618,41 @@ bail:
return status;
}
+static void ocfs2_init_inode_ac_group(struct inode *dir,
+ struct buffer_head *parent_fe_bh,
+ struct ocfs2_alloc_context *ac)
+{
+ struct ocfs2_dinode *fe = (struct ocfs2_dinode *)parent_fe_bh->b_data;
+ /*
+ * Try to allocate inodes from some specific group.
+ *
+ * If the parent dir has recorded the last group used in allocation,
+ * cool, use it. Otherwise if we try to allocate new inode from the
+ * same slot the parent dir belongs to, use the same chunk.
+ *
+ * We are very careful here to avoid the mistake of setting
+ * ac_last_group to a group descriptor from a different (unlocked) slot.
+ */
+ if (OCFS2_I(dir)->ip_last_used_group &&
+ OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot)
+ ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group;
+ else if (le16_to_cpu(fe->i_suballoc_slot) == ac->ac_alloc_slot)
+ ac->ac_last_group = ocfs2_which_suballoc_group(
+ le64_to_cpu(fe->i_blkno),
+ le16_to_cpu(fe->i_suballoc_bit));
+}
+
+static inline void ocfs2_save_inode_ac_group(struct inode *dir,
+ struct ocfs2_alloc_context *ac)
+{
+ OCFS2_I(dir)->ip_last_used_group = ac->ac_last_group;
+ OCFS2_I(dir)->ip_last_used_slot = ac->ac_alloc_slot;
+}
+
int ocfs2_claim_new_inode(struct ocfs2_super *osb,
handle_t *handle,
+ struct inode *dir,
+ struct buffer_head *parent_fe_bh,
struct ocfs2_alloc_context *ac,
u16 *suballoc_bit,
u64 *fe_blkno)
@@ -1635,6 +1668,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
BUG_ON(ac->ac_bits_wanted != 1);
BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE);
+ ocfs2_init_inode_ac_group(dir, parent_fe_bh, ac);
+
status = ocfs2_claim_suballoc_bits(osb,
ac,
handle,
@@ -1653,6 +1688,7 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
*fe_blkno = bg_blkno + (u64) (*suballoc_bit);
ac->ac_bits_given++;
+ ocfs2_save_inode_ac_group(dir, ac);
status = 0;
bail:
mlog_exit(status);
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index e3c13c7..ea85a4c 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -88,6 +88,8 @@ int ocfs2_claim_metadata(struct ocfs2_super *osb,
u64 *blkno_start);
int ocfs2_claim_new_inode(struct ocfs2_super *osb,
handle_t *handle,
+ struct inode *dir,
+ struct buffer_head *parent_fe_bh,
struct ocfs2_alloc_context *ac,
u16 *suballoc_bit,
u64 *fe_blkno);
--
1.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread* [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
2009-02-24 16:53 [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
@ 2009-02-24 16:53 ` Tao Ma
0 siblings, 0 replies; 14+ messages in thread
From: Tao Ma @ 2009-02-24 16:53 UTC (permalink / raw)
To: ocfs2-devel
In ocfs2, the block group search looks for the "emptiest" group
to allocate from. So if the allocator has many equally(or almost
equally) empty groups, new block group will tend to get spread
out amongst them.
So we add osb_inode_alloc_group in ocfs2_super to record the last
used inode allocation group.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
I have done some basic test and the results are cool.
1. single node test:
first column is the result without inode allocation patches, and the
second one with inode allocation patched enabled. You see we have
great improvement with the second "ls -lR".
echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
real 0m20.548s 0m20.106s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null
real 0m13.965s 0m13.766s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time rm /mnt/ocfs2/linux-2.6.28/ -rf
real 0m13.198s 0m13.091s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
real 0m23.022s 0m21.360s
umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null
real 2m45.189s 0m15.019s(yes, that is it. :) )
2. Tested with 4 nodes(megabyte switch for both cross-node
communication and iscsi), with the same command sequence(using
openmpi to run the command simultaneously). Although we spend
a lot of time in cross-node communication, we still have some
performance improvement.
the 1st tar:
real 356.22s 357.70s
the 1st ls -lR:
real 187.33s 187.32s
the rm:
real 260.68s 262.42s
the 2nd tar:
real 371.92s 358.47s
the 2nd ls:
real 197.16s 188.36s
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/ocfs2.h | 3 +++
fs/ocfs2/suballoc.c | 32 ++++++++++++++++++++++++++++----
2 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index e24559a..a0c6a3d 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -340,6 +340,9 @@ struct ocfs2_super
unsigned int osb_dx_mask;
u32 osb_dx_seed[4];
+
+ /* the group we used to allocate inodes. */
+ u64 osb_inode_alloc_group;
};
#define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info)
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index b7a065e..4c1399c 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -66,6 +66,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
struct inode *alloc_inode,
struct buffer_head *bh,
u64 max_block,
+ u64 *last_alloc_group,
int flags);
static int ocfs2_cluster_group_search(struct inode *inode,
@@ -407,6 +408,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
struct inode *alloc_inode,
struct buffer_head *bh,
u64 max_block,
+ u64 *last_alloc_group,
int flags)
{
int status, credits;
@@ -444,6 +446,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
goto bail;
}
+ if (last_alloc_group && *last_alloc_group != 0) {
+ mlog(0, "use old allocation group %llu for block group alloc\n",
+ (unsigned long long)*last_alloc_group);
+ ac->ac_last_group = *last_alloc_group;
+ }
status = ocfs2_claim_clusters(osb,
handle,
ac,
@@ -518,6 +525,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
alloc_inode->i_blocks = ocfs2_inode_sector_count(alloc_inode);
status = 0;
+
+ /* save the new last alloc group so that the caller can cache it. */
+ if (last_alloc_group)
+ *last_alloc_group = ac->ac_last_group;
+
bail:
if (handle)
ocfs2_commit_trans(osb, handle);
@@ -535,6 +547,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
struct ocfs2_alloc_context *ac,
int type,
u32 slot,
+ u64 *last_alloc_group,
int flags)
{
int status;
@@ -600,7 +613,8 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
}
status = ocfs2_block_group_alloc(osb, alloc_inode, bh,
- ac->ac_max_block, flags);
+ ac->ac_max_block,
+ last_alloc_group, flags);
if (status < 0) {
if (status != -ENOSPC)
mlog_errno(status);
@@ -644,7 +658,7 @@ int ocfs2_reserve_new_metadata_blocks(struct ocfs2_super *osb,
status = ocfs2_reserve_suballoc_bits(osb, (*ac),
EXTENT_ALLOC_SYSTEM_INODE,
- slot, ALLOC_NEW_GROUP);
+ slot, NULL, ALLOC_NEW_GROUP);
if (status < 0) {
if (status != -ENOSPC)
mlog_errno(status);
@@ -690,7 +704,8 @@ static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
status = ocfs2_reserve_suballoc_bits(osb, ac,
INODE_ALLOC_SYSTEM_INODE,
- slot, NOT_ALLOC_NEW_GROUP);
+ slot, NULL,
+ NOT_ALLOC_NEW_GROUP);
if (status >= 0) {
ocfs2_set_inode_steal_slot(osb, slot);
break;
@@ -707,6 +722,7 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
{
int status;
s16 slot = ocfs2_get_inode_steal_slot(osb);
+ u64 alloc_group;
*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
if (!(*ac)) {
@@ -742,14 +758,22 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
goto inode_steal;
atomic_set(&osb->s_num_inodes_stolen, 0);
+ alloc_group = osb->osb_inode_alloc_group;
status = ocfs2_reserve_suballoc_bits(osb, *ac,
INODE_ALLOC_SYSTEM_INODE,
osb->slot_num,
+ &alloc_group,
ALLOC_NEW_GROUP |
ALLOC_GROUPS_FROM_GLOBAL);
if (status >= 0) {
status = 0;
+ spin_lock(&osb->osb_lock);
+ osb->osb_inode_alloc_group = alloc_group;
+ spin_unlock(&osb->osb_lock);
+ mlog(0, "after reservation, new allocation group is "
+ "%llu\n", (unsigned long long)alloc_group);
+
/*
* Some inodes must be freed by us, so try to allocate
* from our own next time.
@@ -796,7 +820,7 @@ int ocfs2_reserve_cluster_bitmap_bits(struct ocfs2_super *osb,
status = ocfs2_reserve_suballoc_bits(osb, ac,
GLOBAL_BITMAP_SYSTEM_INODE,
- OCFS2_INVALID_SLOT,
+ OCFS2_INVALID_SLOT, NULL,
ALLOC_NEW_GROUP);
if (status < 0 && status != -ENOSPC) {
mlog_errno(status);
--
1.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-02-24 16:53 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-28 6:49 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Tao Ma
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
2009-01-06 19:02 ` Mark Fasheh
2009-01-07 0:29 ` Tao Ma
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap Tao Ma
2009-01-06 19:47 ` Mark Fasheh
2008-11-27 22:58 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
2009-01-06 21:23 ` Mark Fasheh
2009-01-07 0:49 ` Tao Ma
2009-01-07 1:03 ` Mark Fasheh
2009-01-06 21:41 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement Mark Fasheh
2009-01-07 0:27 ` Tao Ma
-- strict thread matches above, loose matches on Subject: below --
2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
2009-02-24 16:53 [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
2009-02-24 16:53 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.