All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 0/3] Add inode stealing in ocfs2.V3
@ 2008-03-03  1:00 Tao Ma
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 1/3] Add a new parameter for ocfs2_reserve_suballoc_bits.V3 Tao Ma
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Tao Ma @ 2008-03-03  1:00 UTC (permalink / raw)
  To: ocfs2-devel

Hi all,
This patch series add inode steal mechanism for inode allocation.

Modification from V2 to V3:
1. Add a new member to record the times we have stealed inode from
   other slots so that we can go directly to inode steal without
   trying our own slot every time.

In OCFS2, we allocate the inodes from slot specific inode_alloc to avoid
inode creation congestion. The local alloc file grows in a large contiguous
chunk. As for a 4K bs, it grows 4M every time. So 1024 inodes will be
allocated at a time.

Over time, if the fs gets fragmented enough(e.g, the user has created many
small files and also delete some of them), we can end up in a situation,
whereby we cannot extend the inode_alloc as we don't have a large chunk
free in the global_bitmap even if df shows few gigs free. More annoying is
that this situation will invariably mean that while one cannot create inodes
on one node but can from another node. Still more annoying is that an unused
slot may have space for plenty of inodes but is unusable as the user may not
be mounting as many nodes anymore.

This patch series implement a solution which is to steal inodes from another
slot. 2 new variables are added for it. They are:
1) ocfs2_super->inode_steal_slot. It is initalized as invalid and only
   set valid when we steal inode from other slots successfully. When we
   flush the truncate log, complete local alloc recovery or allocate
   from our own slot successfully, it will be reset to invalid.
2) inode_steal_times. It is used to record the times we try to steal
   inode from other nodes. And it is increased no matter whether our steal
   succeed or not. It is reset to zero when we try to allocate from our
   own slot.

So with this 2 new variables, now the whole inode allocation process is:
1. Check whether the ocfs2_super->inode_steal_slot is valid. If it is
   invalid, goto step 2, that is to try to allocate from our own. If it
   is valid, then we must have stealed inode successfully just now, so
   verify whether we have steal "inode_steal_times". If yes, goto step 2
   since now we need to try own slot in case there is some space for us.
   If not, goto step 3 and steal from other nodes directly.
2. Allocate from its own inode_alloc:000X and zero inode_steal_times.
   1) If we can reserve, OK.
   2) If fails, try to allocate a large chunk and reserve once again.
   3) If OK, clear ocfs2_super->inode_steal_slot and exit directly.
3. Try to allocate from other nodes.
   1) If ocfs2_super->inode_steal_slot is valid, start from that node,
      otherwise start from the node next to us. This time, Just try to
      reserve in inode_alloc, we don't go for global_bitmap if this
      node also can't allocate the inode.
   3) Try the node next until we reach the first steal slot again.
   4) If we succeed in one node's inode_alloc, set
      ocfs2_super->inode_steal_slot to it.
   5) increase inode_steal_times.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] Add a new parameter for ocfs2_reserve_suballoc_bits.V3
  2008-03-03  1:00 [Ocfs2-devel] [PATCH 0/3] Add inode stealing in ocfs2.V3 Tao Ma
@ 2008-03-03  1:13 ` Tao Ma
  2008-03-03 18:32   ` Sunil Mushran
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 2/3] Add ac_alloc_slot in ocfs2_alloc_context.V3 Tao Ma
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3 Tao Ma
  2 siblings, 1 reply; 13+ messages in thread
From: Tao Ma @ 2008-03-03  1:13 UTC (permalink / raw)
  To: ocfs2-devel

In some cases(Inode stealing from other nodes), we may not want
ocfs2_reserve_suballoc_bits to allocate new groups from the
global_bitmap since it may already be full. So add a new parameter
for this.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
---
 fs/ocfs2/suballoc.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 72c198a..3be4e73 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -46,6 +46,9 @@
 
 #include "buffer_head_io.h"
 
+#define NOT_ALLOC_NEW_GROUP		0
+#define ALLOC_NEW_GROUP			1
+
 static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg);
 static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe);
 static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl);
@@ -391,7 +394,8 @@ bail:
 static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 				       struct ocfs2_alloc_context *ac,
 				       int type,
-				       u32 slot)
+				       u32 slot,
+				       int alloc_new_group)
 {
 	int status;
 	u32 bits_wanted = ac->ac_bits_wanted;
@@ -446,6 +450,14 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 			goto bail;
 		}
 
+		if (alloc_new_group != ALLOC_NEW_GROUP) {
+			mlog(0, "Alloc File %u Full: wanted=%u, free_bits=%u, "
+			     "and we don't alloc a new group for it.\n",
+			     slot, bits_wanted, free_bits);
+			status = -ENOSPC;
+			goto bail;
+		}
+
 		status = ocfs2_block_group_alloc(osb, alloc_inode, bh);
 		if (status < 0) {
 			if (status != -ENOSPC)
@@ -490,7 +502,8 @@ int ocfs2_reserve_new_metadata(struct ocfs2_super *osb,
 	(*ac)->ac_group_search = ocfs2_block_group_search;
 
 	status = ocfs2_reserve_suballoc_bits(osb, (*ac),
-					     EXTENT_ALLOC_SYSTEM_INODE, slot);
+					     EXTENT_ALLOC_SYSTEM_INODE,
+					     slot, ALLOC_NEW_GROUP);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
@@ -527,7 +540,7 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
-					     osb->slot_num);
+					     osb->slot_num, ALLOC_NEW_GROUP);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
@@ -557,7 +570,8 @@ int ocfs2_reserve_cluster_bitmap_bits(struct ocfs2_super *osb,
 
 	status = ocfs2_reserve_suballoc_bits(osb, ac,
 					     GLOBAL_BITMAP_SYSTEM_INODE,
-					     OCFS2_INVALID_SLOT);
+					     OCFS2_INVALID_SLOT,
+					     ALLOC_NEW_GROUP);
 	if (status < 0 && status != -ENOSPC) {
 		mlog_errno(status);
 		goto bail;
-- 
1.5.3.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] Add ac_alloc_slot in ocfs2_alloc_context.V3
  2008-03-03  1:00 [Ocfs2-devel] [PATCH 0/3] Add inode stealing in ocfs2.V3 Tao Ma
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 1/3] Add a new parameter for ocfs2_reserve_suballoc_bits.V3 Tao Ma
@ 2008-03-03  1:13 ` Tao Ma
  2008-03-03 18:25   ` Sunil Mushran
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3 Tao Ma
  2 siblings, 1 reply; 13+ messages in thread
From: Tao Ma @ 2008-03-03  1:13 UTC (permalink / raw)
  To: ocfs2-devel

In inode stealing, we no longer restrict the allocation to
happen in the local node. So it is neccessary for us to add
a new member in ocfs2_alloc_context to indicate which slot
we are using for allocation. We also modify the process of
local alloc so that this member can be used there also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
 fs/ocfs2/localalloc.c |    2 ++
 fs/ocfs2/suballoc.c   |    1 +
 fs/ocfs2/suballoc.h   |    1 +
 3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index add1ffd..80d1c75 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -526,6 +526,8 @@ int ocfs2_reserve_local_alloc_bits(struct ocfs2_super *osb,
 	}
 
 	ac->ac_inode = local_alloc_inode;
+	/* We should never use localalloc from another slot */
+	ac->ac_alloc_slot = osb->slot_num;
 	ac->ac_which = OCFS2_AC_USE_LOCAL;
 	get_bh(osb->local_alloc_bh);
 	ac->ac_bh = osb->local_alloc_bh;
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 3be4e73..33d5573 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -424,6 +424,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 	}
 
 	ac->ac_inode = alloc_inode;
+	ac->ac_alloc_slot = slot;
 
 	fe = (struct ocfs2_dinode *) bh->b_data;
 	if (!OCFS2_IS_VALID_DINODE(fe)) {
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index 8799033..544c600 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -36,6 +36,7 @@ typedef int (group_search_t)(struct inode *,
 struct ocfs2_alloc_context {
 	struct inode *ac_inode;    /* which bitmap are we allocating from? */
 	struct buffer_head *ac_bh; /* file entry bh */
+	u32    ac_alloc_slot;   /* which slot are we allocating from? */
 	u32    ac_bits_wanted;
 	u32    ac_bits_given;
 #define OCFS2_AC_USE_LOCAL 1
-- 
1.5.3.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3
  2008-03-03  1:00 [Ocfs2-devel] [PATCH 0/3] Add inode stealing in ocfs2.V3 Tao Ma
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 1/3] Add a new parameter for ocfs2_reserve_suballoc_bits.V3 Tao Ma
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 2/3] Add ac_alloc_slot in ocfs2_alloc_context.V3 Tao Ma
@ 2008-03-03  1:13 ` Tao Ma
  2008-03-03 18:25   ` Sunil Mushran
  2 siblings, 1 reply; 13+ messages in thread
From: Tao Ma @ 2008-03-03  1:13 UTC (permalink / raw)
  To: ocfs2-devel

2 new variables are added. They are:
1) ocfs2_super->inode_steal_slot. It is initalized as invalid and only
   set valid when we steal inode from other slots successfully. When we
   flush the truncate log, complete local alloc recovery or allocate
   from our own slot successfully, it will be reset to invalid.
2) inode_steal_times. It is used to record the times we try to steal
   inode from other nodes. And it is increased no matter whether our steal
   succeed or not. It is reset to zero when we try to allocate from our
   own slot.

So with this 2 new variables, now the whole inode allocation process is:
1. Check whether the ocfs2_super->inode_steal_slot is valid. If it is
   invalid, goto step 2, that is to try to allocate from our own. If it
   is valid, then we must have stealed inode successfully just now, so
   verify whether we have steal "inode_steal_times". If yes, goto step 2
   since now we need to try own slot in case there is some space for us.
   If not, goto step 3 and steal from other nodes directly.
2. Allocate from its own inode_alloc:000X and zero inode_steal_times.
   1) If we can reserve, OK.
   2) If fails, try to allocate a large chunk and reserve once again.
   3) If OK, clear ocfs2_super->inode_steal_slot and exit directly.
3. Try to allocate from other nodes.
   1) If ocfs2_super->inode_steal_slot is valid, start from that node,
      otherwise start from the node next to us. This time, Just try to
      reserve in inode_alloc, we don't go for global_bitmap if this
      node also can't allocate the inode.
   3) Try the node next until we reach the first steal slot again.
   4) If we succeed in one node's inode_alloc, set
      ocfs2_super->inode_steal_slot to it.
   5) increase inode_steal_times.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
 fs/ocfs2/alloc.c      |    2 +
 fs/ocfs2/localalloc.c |    2 +
 fs/ocfs2/namei.c      |    2 +-
 fs/ocfs2/ocfs2.h      |   32 +++++++++++++++++-
 fs/ocfs2/suballoc.c   |   83 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/ocfs2/super.c      |    1 +
 6 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 447206e..f333cdc 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -4788,6 +4788,8 @@ static void ocfs2_truncate_log_worker(struct work_struct *work)
 	status = ocfs2_flush_truncate_log(osb);
 	if (status < 0)
 		mlog_errno(status);
+	else
+		ocfs2_init_inode_steal_slot(osb);
 
 	mlog_exit(status);
 }
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 80d1c75..ca3bf0c 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -450,6 +450,8 @@ out_mutex:
 	iput(main_bm_inode);
 
 out:
+	if (!status)
+		ocfs2_init_inode_steal_slot(osb);
 	mlog_exit(status);
 	return status;
 }
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index ae9ad95..ab5a227 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -424,7 +424,7 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
 	fe->i_fs_generation = cpu_to_le32(osb->fs_generation);
 	fe->i_blkno = cpu_to_le64(fe_blkno);
 	fe->i_suballoc_bit = cpu_to_le16(suballoc_bit);
-	fe->i_suballoc_slot = cpu_to_le16(osb->slot_num);
+	fe->i_suballoc_slot = cpu_to_le16(inode_ac->ac_alloc_slot);
 	fe->i_uid = cpu_to_le32(current->fsuid);
 	if (dir->i_mode & S_ISGID) {
 		fe->i_gid = cpu_to_le32(dir->i_gid);
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6546cef..2278ec0 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -206,11 +206,13 @@ struct ocfs2_super
 	u32 s_feature_incompat;
 	u32 s_feature_ro_compat;
 
-	/* Protects s_next_generaion, osb_flags. Could protect more on
-	 * osb as it's very short lived. */
+	/* Protects s_next_generation, osb_flags and s_inode_steal_slot.
+	 * Could protect more on osb as it's very short lived.
+	 */
 	spinlock_t osb_lock;
 	u32 s_next_generation;
 	unsigned long osb_flags;
+	s16 s_inode_steal_slot;
 
 	unsigned long s_mount_opt;
 	unsigned int s_atime_quantum;
@@ -522,6 +524,32 @@ static inline unsigned int ocfs2_pages_per_cluster(struct super_block *sb)
 	return pages_per_cluster;
 }
 
+static inline void ocfs2_init_inode_steal_slot(struct ocfs2_super *osb)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = OCFS2_INVALID_SLOT;
+	spin_unlock(&osb->osb_lock);
+}
+
+static inline void ocfs2_set_inode_steal_slot(struct ocfs2_super *osb,
+					      u16 slot)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = slot;
+	spin_unlock(&osb->osb_lock);
+}
+
+static inline u16 ocfs2_get_inode_steal_slot(struct ocfs2_super *osb)
+{
+	s16 slot;
+
+	spin_lock(&osb->osb_lock);
+	slot = osb->s_inode_steal_slot;
+	spin_unlock(&osb->osb_lock);
+
+	return slot;
+}
+
 #define ocfs2_set_bit ext2_set_bit
 #define ocfs2_clear_bit ext2_clear_bit
 #define ocfs2_test_bit ext2_test_bit
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 33d5573..dad4b70 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -109,7 +109,7 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode,
 						u64 *bg_blkno,
 						u16 *bg_bit_off);
 
-void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+static void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac)
 {
 	struct inode *inode = ac->ac_inode;
 
@@ -120,9 +120,17 @@ void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
 		mutex_unlock(&inode->i_mutex);
 
 		iput(inode);
+		ac->ac_inode = NULL;
 	}
-	if (ac->ac_bh)
+	if (ac->ac_bh) {
 		brelse(ac->ac_bh);
+		ac->ac_bh = NULL;
+	}
+}
+
+void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+{
+	ocfs2_free_ac_resource(ac);
 	kfree(ac);
 }
 
@@ -522,10 +530,47 @@ bail:
 	return status;
 }
 
+static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
+					      struct ocfs2_alloc_context *ac)
+{
+	int status = -ENOSPC, i;
+	s16 slot = ocfs2_get_inode_steal_slot(osb);
+
+	/*
+	 * This is the first time we try to steal inodes, so begin
+	 * with the node next to us first.
+	 */
+	if (slot == OCFS2_INVALID_SLOT)
+		slot = osb->slot_num + 1;
+
+	for (i = 0; i < osb->max_slots; i++, slot++) {
+		if (slot == osb->max_slots)
+			slot = 0;
+
+		if (slot == osb->slot_num)
+			continue;
+
+		status = ocfs2_reserve_suballoc_bits(osb, ac,
+						     INODE_ALLOC_SYSTEM_INODE,
+						     slot, NOT_ALLOC_NEW_GROUP);
+		if (status >= 0) {
+			ocfs2_set_inode_steal_slot(osb, slot);
+			break;
+		}
+
+		ocfs2_free_ac_resource(ac);
+	}
+
+	return status;
+}
+
 int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 			    struct ocfs2_alloc_context **ac)
 {
 	int status;
+	s16 slot = ocfs2_get_inode_steal_slot(osb);
+#define OCFS2_INODE_STEAL_TIMES 10
+	static atomic_t inode_steal_times = ATOMIC_INIT(0);
 
 	*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
 	if (!(*ac)) {
@@ -539,9 +584,43 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 
 	(*ac)->ac_group_search = ocfs2_block_group_search;
 
+	/*
+	 * slot is set when we successfully steal inode from other nodes.
+	 * It is reset in 3 places:
+	 * 1. when we flush the turncate log
+	 * 2. when we complete local alloc recovery.
+	 * 3. when we successfully allocate from our own slot.
+	 * After it is set, we will go on stealing inodes until we find the
+	 * need to check our slots to see wether there is some space for us.
+	 */
+	if (slot != OCFS2_INVALID_SLOT &&
+	    atomic_read(&inode_steal_times) < OCFS2_INODE_STEAL_TIMES)
+		goto inode_steal;
+
+	atomic_set(&inode_steal_times, 0);
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
 					     osb->slot_num, ALLOC_NEW_GROUP);
+	if (status >= 0) {
+		status = 0;
+
+		/*
+		 * Some inodes must be freed by us, so try to allocate
+		 * from our own next time.
+		 */
+		if (slot != OCFS2_INVALID_SLOT)
+			ocfs2_init_inode_steal_slot(osb);
+		goto bail;
+	} else if (status < 0 && status != -ENOSPC) {
+		mlog_errno(status);
+		goto bail;
+	}
+
+	ocfs2_free_ac_resource(*ac);
+
+inode_steal:
+	status = ocfs2_steal_inode_from_other_nodes(osb, *ac);
+	atomic_inc(&inode_steal_times);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index bec75af..c4e82c7 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1193,6 +1193,7 @@ static int ocfs2_mount_volume(struct super_block *sb)
 		mlog_errno(status);
 		goto leave;
 	}
+	ocfs2_init_inode_steal_slot(osb);
 
 	/* load all node-local system inodes */
 	status = ocfs2_init_local_system_inodes(osb);
-- 
1.5.3.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3 Tao Ma
@ 2008-03-03 18:25   ` Sunil Mushran
  2008-03-03 18:47     ` Tao Ma
  2008-03-04  0:55     ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V4 Tao Ma
  0 siblings, 2 replies; 13+ messages in thread
From: Sunil Mushran @ 2008-03-03 18:25 UTC (permalink / raw)
  To: ocfs2-devel

Tao Ma wrote:
> +static inline void ocfs2_set_inode_steal_slot(struct ocfs2_super *osb,
> +					      u16 slot)
> +{
> +	spin_lock(&osb->osb_lock);
> +	osb->s_inode_steal_slot = slot;
> +	spin_unlock(&osb->osb_lock);
> +}

Shouldn't slot be s16 and not u16.

>  
> +static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
> +					      struct ocfs2_alloc_context *ac)
> +{
> +	int status = -ENOSPC, i;
> +	s16 slot = ocfs2_get_inode_steal_slot(osb);
> +
> +	/*
> +	 * This is the first time we try to steal inodes, so begin
> +	 * with the node next to us first.
> +	 */

Change comment to:
/* Start to steal inodes from the first slot after ours. */

> +	if (slot == OCFS2_INVALID_SLOT)
> +		slot = osb->slot_num + 1;
> +
> +	for (i = 0; i < osb->max_slots; i++, slot++) {
> +		if (slot == osb->max_slots)
> +			slot = 0;
> +
> +		if (slot == osb->slot_num)
> +			continue;
> +
> +		status = ocfs2_reserve_suballoc_bits(osb, ac,
> +						     INODE_ALLOC_SYSTEM_INODE,
> +						     slot, NOT_ALLOC_NEW_GROUP);
> +		if (status >= 0) {
> +			ocfs2_set_inode_steal_slot(osb, slot);
> +			break;
> +		}
> +
> +		ocfs2_free_ac_resource(ac);
> +	}
> +
> +	return status;
> +}
> +
if (slot == OCFS2_INVALID_SLOT)
slot = (osb->slot_num + 1 == osb->max_slots) ? 0 : osb->slot_num + 1;

while (slot != osb->slot_num) {
status = ocfs2_reserve_suballoc_bits(osb, ....);
...
ocfs2_free_ac_resource(ac);
slot = (slot + 1 == osb->max_slots) ? 0 : slot + 1;
}

Not that your code is incorrect. Other option is to not use "i"
and replace continue with a break.

>  int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
>  			    struct ocfs2_alloc_context **ac)
>  {
>  	int status;
> +	s16 slot = ocfs2_get_inode_steal_slot(osb);
> +#define OCFS2_INODE_STEAL_TIMES 10
> +	static atomic_t inode_steal_times = ATOMIC_INIT(0);

Can you rename inode_steal_times to num_inodes_stolen. BTW, this
will be for all mounts which is not what we want. Why don't you
add this to ocfs2_super? s_num_inodes_stolen. In that case, protect
it using osb_lock.

>  
>  	*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
>  	if (!(*ac)) {
> @@ -539,9 +584,43 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
>  
>  	(*ac)->ac_group_search = ocfs2_block_group_search;
>  
> +	/*
> +	 * slot is set when we successfully steal inode from other nodes.
> +	 * It is reset in 3 places:
> +	 * 1. when we flush the turncate log

spelling... truncate

> +	 * 2. when we complete local alloc recovery.
> +	 * 3. when we successfully allocate from our own slot.
> +	 * After it is set, we will go on stealing inodes until we find the
> +	 * need to check our slots to see wether there is some space for us.
> +	 */

spelling... whether

> +	if (slot != OCFS2_INVALID_SLOT &&
> +	    atomic_read(&inode_steal_times) < OCFS2_INODE_STEAL_TIMES)
> +		goto inode_steal;
> +
> +	atomic_set(&inode_steal_times, 0);
>  	status = ocfs2_reserve_suballoc_bits(osb, *ac,
>  					     INODE_ALLOC_SYSTEM_INODE,
>  					     osb->slot_num, ALLOC_NEW_GROUP);
> +	if (status >= 0) {
> +		status = 0;
> +
> +		/*
> +		 * Some inodes must be freed by us, so try to allocate
> +		 * from our own next time.
> +		 */
> +		if (slot != OCFS2_INVALID_SLOT)
> +			ocfs2_init_inode_steal_slot(osb);
> +		goto bail;
> +	} else if (status < 0 && status != -ENOSPC) {
> +		mlog_errno(status);
> +		goto bail;
> +	}
> +
> +	ocfs2_free_ac_resource(*ac);
> +
> +inode_steal:
> +	status = ocfs2_steal_inode_from_other_nodes(osb, *ac);
> +	atomic_inc(&inode_steal_times);
>  	if (status < 0) {
>  		if (status != -ENOSPC)
>  			mlog_errno(status);
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index bec75af..c4e82c7 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1193,6 +1193,7 @@ static int ocfs2_mount_volume(struct super_block *sb)
>  		mlog_errno(status);
>  		goto leave;
>  	}
> +	ocfs2_init_inode_steal_slot(osb);
>  
>  	/* load all node-local system inodes */
>  	status = ocfs2_init_local_system_inodes(osb);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] Add ac_alloc_slot in ocfs2_alloc_context.V3
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 2/3] Add ac_alloc_slot in ocfs2_alloc_context.V3 Tao Ma
@ 2008-03-03 18:25   ` Sunil Mushran
  0 siblings, 0 replies; 13+ messages in thread
From: Sunil Mushran @ 2008-03-03 18:25 UTC (permalink / raw)
  To: ocfs2-devel

Tao Ma wrote:
> In inode stealing, we no longer restrict the allocation to
> happen in the local node. So it is neccessary for us to add
> a new member in ocfs2_alloc_context to indicate which slot
> we are using for allocation. We also modify the process of
> local alloc so that this member can be used there also.
>
> Signed-off-by: Tao Ma <tao.ma@oracle.com>
>   

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

> ---
>  fs/ocfs2/localalloc.c |    2 ++
>  fs/ocfs2/suballoc.c   |    1 +
>  fs/ocfs2/suballoc.h   |    1 +
>  3 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> index add1ffd..80d1c75 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -526,6 +526,8 @@ int ocfs2_reserve_local_alloc_bits(struct ocfs2_super *osb,
>  	}
>  
>  	ac->ac_inode = local_alloc_inode;
> +	/* We should never use localalloc from another slot */
> +	ac->ac_alloc_slot = osb->slot_num;
>  	ac->ac_which = OCFS2_AC_USE_LOCAL;
>  	get_bh(osb->local_alloc_bh);
>  	ac->ac_bh = osb->local_alloc_bh;
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 3be4e73..33d5573 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -424,6 +424,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
>  	}
>  
>  	ac->ac_inode = alloc_inode;
> +	ac->ac_alloc_slot = slot;
>  
>  	fe = (struct ocfs2_dinode *) bh->b_data;
>  	if (!OCFS2_IS_VALID_DINODE(fe)) {
> diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
> index 8799033..544c600 100644
> --- a/fs/ocfs2/suballoc.h
> +++ b/fs/ocfs2/suballoc.h
> @@ -36,6 +36,7 @@ typedef int (group_search_t)(struct inode *,
>  struct ocfs2_alloc_context {
>  	struct inode *ac_inode;    /* which bitmap are we allocating from? */
>  	struct buffer_head *ac_bh; /* file entry bh */
> +	u32    ac_alloc_slot;   /* which slot are we allocating from? */
>  	u32    ac_bits_wanted;
>  	u32    ac_bits_given;
>  #define OCFS2_AC_USE_LOCAL 1
>   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] Add a new parameter for ocfs2_reserve_suballoc_bits.V3
  2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 1/3] Add a new parameter for ocfs2_reserve_suballoc_bits.V3 Tao Ma
@ 2008-03-03 18:32   ` Sunil Mushran
  0 siblings, 0 replies; 13+ messages in thread
From: Sunil Mushran @ 2008-03-03 18:32 UTC (permalink / raw)
  To: ocfs2-devel

Tao Ma wrote:
> In some cases(Inode stealing from other nodes), we may not want
> ocfs2_reserve_suballoc_bits to allocate new groups from the
> global_bitmap since it may already be full. So add a new parameter
> for this.
>
> Signed-off-by: Tao Ma <tao.ma@oracle.com>
> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
>   

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

> ---
>  fs/ocfs2/suballoc.c |   22 ++++++++++++++++++----
>  1 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 72c198a..3be4e73 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -46,6 +46,9 @@
>  
>  #include "buffer_head_io.h"
>  
> +#define NOT_ALLOC_NEW_GROUP		0
> +#define ALLOC_NEW_GROUP			1
> +
>  static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg);
>  static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe);
>  static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl);
> @@ -391,7 +394,8 @@ bail:
>  static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
>  				       struct ocfs2_alloc_context *ac,
>  				       int type,
> -				       u32 slot)
> +				       u32 slot,
> +				       int alloc_new_group)
>  {
>  	int status;
>  	u32 bits_wanted = ac->ac_bits_wanted;
> @@ -446,6 +450,14 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
>  			goto bail;
>  		}
>  
> +		if (alloc_new_group != ALLOC_NEW_GROUP) {
> +			mlog(0, "Alloc File %u Full: wanted=%u, free_bits=%u, "
> +			     "and we don't alloc a new group for it.\n",
> +			     slot, bits_wanted, free_bits);
> +			status = -ENOSPC;
> +			goto bail;
> +		}
> +
>  		status = ocfs2_block_group_alloc(osb, alloc_inode, bh);
>  		if (status < 0) {
>  			if (status != -ENOSPC)
> @@ -490,7 +502,8 @@ int ocfs2_reserve_new_metadata(struct ocfs2_super *osb,
>  	(*ac)->ac_group_search = ocfs2_block_group_search;
>  
>  	status = ocfs2_reserve_suballoc_bits(osb, (*ac),
> -					     EXTENT_ALLOC_SYSTEM_INODE, slot);
> +					     EXTENT_ALLOC_SYSTEM_INODE,
> +					     slot, ALLOC_NEW_GROUP);
>  	if (status < 0) {
>  		if (status != -ENOSPC)
>  			mlog_errno(status);
> @@ -527,7 +540,7 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
>  
>  	status = ocfs2_reserve_suballoc_bits(osb, *ac,
>  					     INODE_ALLOC_SYSTEM_INODE,
> -					     osb->slot_num);
> +					     osb->slot_num, ALLOC_NEW_GROUP);
>  	if (status < 0) {
>  		if (status != -ENOSPC)
>  			mlog_errno(status);
> @@ -557,7 +570,8 @@ int ocfs2_reserve_cluster_bitmap_bits(struct ocfs2_super *osb,
>  
>  	status = ocfs2_reserve_suballoc_bits(osb, ac,
>  					     GLOBAL_BITMAP_SYSTEM_INODE,
> -					     OCFS2_INVALID_SLOT);
> +					     OCFS2_INVALID_SLOT,
> +					     ALLOC_NEW_GROUP);
>  	if (status < 0 && status != -ENOSPC) {
>  		mlog_errno(status);
>  		goto bail;
>   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3
  2008-03-03 18:25   ` Sunil Mushran
@ 2008-03-03 18:47     ` Tao Ma
  2008-03-03 18:57       ` Sunil Mushran
  2008-03-04  0:55     ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V4 Tao Ma
  1 sibling, 1 reply; 13+ messages in thread
From: Tao Ma @ 2008-03-03 18:47 UTC (permalink / raw)
  To: ocfs2-devel

Sunil Mushran wrote:
> Change comment to:
> /* Start to steal inodes from the first slot after ours. */
> 
>> +    if (slot == OCFS2_INVALID_SLOT)
>> +        slot = osb->slot_num + 1;
>> +
>> +    for (i = 0; i < osb->max_slots; i++, slot++) {
>> +        if (slot == osb->max_slots)
>> +            slot = 0;
>> +
>> +        if (slot == osb->slot_num)
>> +            continue;
>> +
>> +        status = ocfs2_reserve_suballoc_bits(osb, ac,
>> +                             INODE_ALLOC_SYSTEM_INODE,
>> +                             slot, NOT_ALLOC_NEW_GROUP);
>> +        if (status >= 0) {
>> +            ocfs2_set_inode_steal_slot(osb, slot);
>> +            break;
>> +        }
>> +
>> +        ocfs2_free_ac_resource(ac);
>> +    }
>> +
>> +    return status;
>> +}
>> +
> if (slot == OCFS2_INVALID_SLOT)
> slot = (osb->slot_num + 1 == osb->max_slots) ? 0 : osb->slot_num + 1;
> 
> while (slot != osb->slot_num) {
> status = ocfs2_reserve_suballoc_bits(osb, ....);
> ...
> ocfs2_free_ac_resource(ac);
> slot = (slot + 1 == osb->max_slots) ? 0 : slot + 1;
> }
> 
> Not that your code is incorrect. Other option is to not use "i"
> and replace continue with a break.
I use "i" because I am always worried about if I can't handle the 
condition carefully. :( So you think using "i" is not gracefully, or not 
easy to understand, or any other reason?
> 
>>  int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
>>                  struct ocfs2_alloc_context **ac)
>>  {
>>      int status;
>> +    s16 slot = ocfs2_get_inode_steal_slot(osb);
>> +#define OCFS2_INODE_STEAL_TIMES 10
>> +    static atomic_t inode_steal_times = ATOMIC_INIT(0);
> 
> Can you rename inode_steal_times to num_inodes_stolen. BTW, this
> will be for all mounts which is not what we want. Why don't you
> add this to ocfs2_super? s_num_inodes_stolen. In that case, protect
> it using osb_lock.
I forget about this issue and used to think of it is only used in this 
function. :( I Will modify it. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3
  2008-03-03 18:47     ` Tao Ma
@ 2008-03-03 18:57       ` Sunil Mushran
  2008-03-03 19:21         ` Tao Ma
  0 siblings, 1 reply; 13+ messages in thread
From: Sunil Mushran @ 2008-03-03 18:57 UTC (permalink / raw)
  To: ocfs2-devel

Tao Ma wrote:
> I use "i" because I am always worried about if I can't handle the 
> condition carefully. :( So you think using "i" is not gracefully, or 
> not easy to understand, or any other reason?
Mostly unnecessary. We are going circular and want to stop
when we reach our slot.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3
  2008-03-03 18:57       ` Sunil Mushran
@ 2008-03-03 19:21         ` Tao Ma
  0 siblings, 0 replies; 13+ messages in thread
From: Tao Ma @ 2008-03-03 19:21 UTC (permalink / raw)
  To: ocfs2-devel



Sunil Mushran wrote:
> Tao Ma wrote:
>> I use "i" because I am always worried about if I can't handle the 
>> condition carefully. :( So you think using "i" is not gracefully, or 
>> not easy to understand, or any other reason?
> Mostly unnecessary. We are going circular and want to stop
> when we reach our slot.
> 
OK, I will use "while".
Another concern, we should stop when we reach the first test one, not 
our own slot.
so it should be "while (slot != start_slot)".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V4
  2008-03-03 18:25   ` Sunil Mushran
  2008-03-03 18:47     ` Tao Ma
@ 2008-03-04  0:55     ` Tao Ma
  2008-03-04 16:48       ` Mark Fasheh
  1 sibling, 1 reply; 13+ messages in thread
From: Tao Ma @ 2008-03-04  0:55 UTC (permalink / raw)
  To: ocfs2-devel

Modified as Sunil suggested.

2 new variables are added. They are:
1) ocfs2_super->s_inode_steal_slot. It is initalized as invalid and only
   set valid when we steal inode from other slots successfully. When we
   flush the truncate log, complete local alloc recovery or allocate
   from our own slot successfully, it will be reset to invalid.
2) ocfs2_super->s_num_inodes_stolen. It is used to record the times we try
   to steal inode from other nodes. And it is increased no matter whether
   our steal succeed or not. It is reset to zero when we try to allocate
   from our own slot or reset the s_inode_steal_slot.

So with this 2 new variables, now the whole inode allocation process is:
1. Check whether the s_inode_steal_slot is valid. If it is invalid, goto
   step 2, that is to try to allocate from our own. If it is valid, then
   we must have stealed inode successfully just now, so verify whether we
   have steal "s_num_inodes_stolen". If yes, goto step 2 since now we need
   to try own slot in case there is some space for us. If not, goto step 3
   and steal from other nodes directly.
2. Allocate from its own inode_alloc:000X and zero s_num_inodes_stolen.
   1) If we can reserve, OK.
   2) If fails, try to allocate a large chunk and reserve once again.
   3) If OK, clear s_inode_steal_slot and exit directly.
3. Try to allocate from other nodes.
   1) If s_inode_steal_slot is valid, start from that node, otherwise
      start from the node next to us. This time, Just try to reserve in
      inode_alloc, we don't go for global_bitmap if this node also can't
      allocate the inode.
   3) Try the node next to it until we reach the first steal slot again.
   4) If we succeed in one node's inode_alloc, set s_inode_steal_slot to it.
   5) increase s_num_inodes_stolen.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
 fs/ocfs2/alloc.c      |    2 +
 fs/ocfs2/localalloc.c |    2 +
 fs/ocfs2/namei.c      |    2 +-
 fs/ocfs2/ocfs2.h      |   36 +++++++++++++++++++++-
 fs/ocfs2/suballoc.c   |   80 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/ocfs2/super.c      |    1 +
 6 files changed, 118 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 447206e..f333cdc 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -4788,6 +4788,8 @@ static void ocfs2_truncate_log_worker(struct work_struct *work)
 	status = ocfs2_flush_truncate_log(osb);
 	if (status < 0)
 		mlog_errno(status);
+	else
+		ocfs2_init_inode_steal_slot(osb);
 
 	mlog_exit(status);
 }
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 80d1c75..ca3bf0c 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -450,6 +450,8 @@ out_mutex:
 	iput(main_bm_inode);
 
 out:
+	if (!status)
+		ocfs2_init_inode_steal_slot(osb);
 	mlog_exit(status);
 	return status;
 }
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index ae9ad95..ab5a227 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -424,7 +424,7 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
 	fe->i_fs_generation = cpu_to_le32(osb->fs_generation);
 	fe->i_blkno = cpu_to_le64(fe_blkno);
 	fe->i_suballoc_bit = cpu_to_le16(suballoc_bit);
-	fe->i_suballoc_slot = cpu_to_le16(osb->slot_num);
+	fe->i_suballoc_slot = cpu_to_le16(inode_ac->ac_alloc_slot);
 	fe->i_uid = cpu_to_le32(current->fsuid);
 	if (dir->i_mode & S_ISGID) {
 		fe->i_gid = cpu_to_le32(dir->i_gid);
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6546cef..e9ad7f9 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -178,6 +178,8 @@ enum ocfs2_mount_options
 #define OCFS2_OSB_ERROR_FS	0x0004
 #define OCFS2_DEFAULT_ATIME_QUANTUM	60
 
+#define OCFS2_NUM_INODES_STOLEN 10
+
 struct ocfs2_journal;
 struct ocfs2_super
 {
@@ -206,11 +208,14 @@ struct ocfs2_super
 	u32 s_feature_incompat;
 	u32 s_feature_ro_compat;
 
-	/* Protects s_next_generaion, osb_flags. Could protect more on
-	 * osb as it's very short lived. */
+	/* Protects s_next_generation, osb_flags and s_inode_steal_slot.
+	 * Could protect more on osb as it's very short lived.
+	 */
 	spinlock_t osb_lock;
 	u32 s_next_generation;
 	unsigned long osb_flags;
+	s16 s_inode_steal_slot;
+	atomic_t s_num_inodes_stolen;
 
 	unsigned long s_mount_opt;
 	unsigned int s_atime_quantum;
@@ -522,6 +527,33 @@ static inline unsigned int ocfs2_pages_per_cluster(struct super_block *sb)
 	return pages_per_cluster;
 }
 
+static inline void ocfs2_init_inode_steal_slot(struct ocfs2_super *osb)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = OCFS2_INVALID_SLOT;
+	spin_unlock(&osb->osb_lock);
+	atomic_set(&osb->s_num_inodes_stolen, 0);
+}
+
+static inline void ocfs2_set_inode_steal_slot(struct ocfs2_super *osb,
+					      s16 slot)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = slot;
+	spin_unlock(&osb->osb_lock);
+}
+
+static inline s16 ocfs2_get_inode_steal_slot(struct ocfs2_super *osb)
+{
+	s16 slot;
+
+	spin_lock(&osb->osb_lock);
+	slot = osb->s_inode_steal_slot;
+	spin_unlock(&osb->osb_lock);
+
+	return slot;
+}
+
 #define ocfs2_set_bit ext2_set_bit
 #define ocfs2_clear_bit ext2_clear_bit
 #define ocfs2_test_bit ext2_test_bit
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 33d5573..2fcefab 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -109,7 +109,7 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode,
 						u64 *bg_blkno,
 						u16 *bg_bit_off);
 
-void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+static void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac)
 {
 	struct inode *inode = ac->ac_inode;
 
@@ -120,9 +120,17 @@ void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
 		mutex_unlock(&inode->i_mutex);
 
 		iput(inode);
+		ac->ac_inode = NULL;
 	}
-	if (ac->ac_bh)
+	if (ac->ac_bh) {
 		brelse(ac->ac_bh);
+		ac->ac_bh = NULL;
+	}
+}
+
+void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+{
+	ocfs2_free_ac_resource(ac);
 	kfree(ac);
 }
 
@@ -522,10 +530,44 @@ bail:
 	return status;
 }
 
+static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
+					      struct ocfs2_alloc_context *ac)
+{
+	int status = -ENOSPC;
+	s16 start_slot, slot = ocfs2_get_inode_steal_slot(osb);
+
+	/* Start to steal inodes from the first slot after ours. */
+	if (slot == OCFS2_INVALID_SLOT)
+		slot = osb->slot_num + 1;
+
+	start_slot = slot;
+	do {
+		if (slot == osb->max_slots)
+			slot = 0;
+
+		if (slot == osb->slot_num)
+			continue;
+
+		status = ocfs2_reserve_suballoc_bits(osb, ac,
+						     INODE_ALLOC_SYSTEM_INODE,
+						     slot, NOT_ALLOC_NEW_GROUP);
+		if (status >= 0) {
+			ocfs2_set_inode_steal_slot(osb, slot);
+			break;
+		}
+
+		ocfs2_free_ac_resource(ac);
+
+	} while (++slot != start_slot);
+
+	return status;
+}
+
 int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 			    struct ocfs2_alloc_context **ac)
 {
 	int status;
+	s16 slot = ocfs2_get_inode_steal_slot(osb);
 
 	*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
 	if (!(*ac)) {
@@ -539,9 +581,43 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 
 	(*ac)->ac_group_search = ocfs2_block_group_search;
 
+	/*
+	 * slot is set when we successfully steal inode from other nodes.
+	 * It is reset in 3 places:
+	 * 1. when we flush the truncate log
+	 * 2. when we complete local alloc recovery.
+	 * 3. when we successfully allocate from our own slot.
+	 * After it is set, we will go on stealing inodes until we find the
+	 * need to check our slots to see whether there is some space for us.
+	 */
+	if (slot != OCFS2_INVALID_SLOT &&
+	    atomic_read(&osb->s_num_inodes_stolen) < OCFS2_NUM_INODES_STOLEN)
+		goto inode_steal;
+
+	atomic_set(&osb->s_num_inodes_stolen, 0);
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
 					     osb->slot_num, ALLOC_NEW_GROUP);
+	if (status >= 0) {
+		status = 0;
+
+		/*
+		 * Some inodes must be freed by us, so try to allocate
+		 * from our own next time.
+		 */
+		if (slot != OCFS2_INVALID_SLOT)
+			ocfs2_init_inode_steal_slot(osb);
+		goto bail;
+	} else if (status < 0 && status != -ENOSPC) {
+		mlog_errno(status);
+		goto bail;
+	}
+
+	ocfs2_free_ac_resource(*ac);
+
+inode_steal:
+	status = ocfs2_steal_inode_from_other_nodes(osb, *ac);
+	atomic_inc(&osb->s_num_inodes_stolen);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index bec75af..c4e82c7 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1193,6 +1193,7 @@ static int ocfs2_mount_volume(struct super_block *sb)
 		mlog_errno(status);
 		goto leave;
 	}
+	ocfs2_init_inode_steal_slot(osb);
 
 	/* load all node-local system inodes */
 	status = ocfs2_init_local_system_inodes(osb);
-- 
1.5.3.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V4
  2008-03-04  0:55     ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V4 Tao Ma
@ 2008-03-04 16:48       ` Mark Fasheh
  2008-03-05  0:13         ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V5 Tao Ma
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Fasheh @ 2008-03-04 16:48 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Mar 04, 2008 at 04:55:25PM +0800, tao.ma wrote:
> Modified as Sunil suggested.
> 
> 2 new variables are added. They are:
> 1) ocfs2_super->s_inode_steal_slot. It is initalized as invalid and only
>    set valid when we steal inode from other slots successfully. When we
>    flush the truncate log, complete local alloc recovery or allocate
>    from our own slot successfully, it will be reset to invalid.
> 2) ocfs2_super->s_num_inodes_stolen. It is used to record the times we try
>    to steal inode from other nodes. And it is increased no matter whether
>    our steal succeed or not. It is reset to zero when we try to allocate
>    from our own slot or reset the s_inode_steal_slot.
> 
> So with this 2 new variables, now the whole inode allocation process is:
> 1. Check whether the s_inode_steal_slot is valid. If it is invalid, goto
>    step 2, that is to try to allocate from our own. If it is valid, then
>    we must have stealed inode successfully just now, so verify whether we
>    have steal "s_num_inodes_stolen". If yes, goto step 2 since now we need
>    to try own slot in case there is some space for us. If not, goto step 3
>    and steal from other nodes directly.
> 2. Allocate from its own inode_alloc:000X and zero s_num_inodes_stolen.
>    1) If we can reserve, OK.
>    2) If fails, try to allocate a large chunk and reserve once again.
>    3) If OK, clear s_inode_steal_slot and exit directly.
> 3. Try to allocate from other nodes.
>    1) If s_inode_steal_slot is valid, start from that node, otherwise
>       start from the node next to us. This time, Just try to reserve in
>       inode_alloc, we don't go for global_bitmap if this node also can't
>       allocate the inode.
>    3) Try the node next to it until we reach the first steal slot again.
>    4) If we succeed in one node's inode_alloc, set s_inode_steal_slot to it.
>    5) increase s_num_inodes_stolen.
> 
> Signed-off-by: Tao Ma <tao.ma@oracle.com>

Ok, this is all in ocfs2.git now. I made the following changes though, so
please verify.
	--Mark

--
Mark Fasheh
Principal Software Developer, Oracle
mark.fasheh@oracle.com

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index e9ad7f9..0273a55 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -178,8 +178,6 @@ enum ocfs2_mount_options
 #define OCFS2_OSB_ERROR_FS	0x0004
 #define OCFS2_DEFAULT_ATIME_QUANTUM	60
 
-#define OCFS2_NUM_INODES_STOLEN 10
-
 struct ocfs2_journal;
 struct ocfs2_super
 {
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 2fcefab..2a61875 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -49,6 +49,8 @@
 #define NOT_ALLOC_NEW_GROUP		0
 #define ALLOC_NEW_GROUP			1
 
+#define OCFS2_MAX_INODES_TO_STEAL	1024
+
 static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg);
 static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe);
 static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl);
@@ -591,7 +593,7 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 	 * need to check our slots to see whether there is some space for us.
 	 */
 	if (slot != OCFS2_INVALID_SLOT &&
-	    atomic_read(&osb->s_num_inodes_stolen) < OCFS2_NUM_INODES_STOLEN)
+	    atomic_read(&osb->s_num_inodes_stolen) < OCFS2_MAX_INODES_TO_STEAL)
 		goto inode_steal;
 
 	atomic_set(&osb->s_num_inodes_stolen, 0);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index c4e82c7..c4e7c97 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1405,6 +1405,8 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	osb->local_alloc_state = OCFS2_LA_UNUSED;
 	osb->local_alloc_bh = NULL;
 
+	osb->s_num_inodes_stolen = ATOMIC_INIT(0);
+
 	ocfs2_setup_hb_callbacks(osb);
 
 	init_waitqueue_head(&osb->osb_mount_event);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V5
  2008-03-04 16:48       ` Mark Fasheh
@ 2008-03-05  0:13         ` Tao Ma
  0 siblings, 0 replies; 13+ messages in thread
From: Tao Ma @ 2008-03-05  0:13 UTC (permalink / raw)
  To: ocfs2-devel

Modification from V4 to V5:
1) Restore the function ocfs2_steal_inode_from_other_nodes.
2) Move ocfs2_init_inode_steal_slot from ocfs2_mount_volume to
   ocfs2_initialize_super.
3) Rename CFS2_NUM_INODES_STOLEN to OCFS2_MAX_INODES_TO_STEAL and move
   it to fs/ocfs2/suballoc.c.

2 new variables are added. They are:
1) ocfs2_super->s_inode_steal_slot. It is initalized as invalid and only
   set valid when we steal inode from other slots successfully. When we
   flush the truncate log, complete local alloc recovery or allocate
   from our own slot successfully, it will be reset to invalid.
2) ocfs2_super->s_num_inodes_stolen. It is used to record the times we try
   to steal inode from other nodes. And it is increased no matter whether
   our steal succeed or not. It is reset to zero when we try to allocate
   from our own slot or reset the s_inode_steal_slot.

So with this 2 new variables, now the whole inode allocation process is:
1. Check whether the s_inode_steal_slot is valid. If it is invalid, goto
   step 2, that is to try to allocate from our own. If it is valid, then
   we must have stealed inode successfully just now, so verify whether we
   have steal "s_num_inodes_stolen". If yes, goto step 2 since now we need
   to try own slot in case there is some space for us. If not, goto step 3
   and steal from other nodes directly.
2. Allocate from its own inode_alloc:000X and zero s_num_inodes_stolen.
   1) If we can reserve, OK.
   2) If fails, try to allocate a large chunk and reserve once again.
   3) If OK, clear s_inode_steal_slot and exit directly.
3. Try to allocate from other nodes.
   1) If s_inode_steal_slot is valid, start from that node, otherwise
      start from the node next to us. This time, Just try to reserve in
      inode_alloc, we don't go for global_bitmap if this node also can't
      allocate the inode.
   3) Try the node next to it until we reach the first steal slot again.
   4) If we succeed in one node's inode_alloc, set s_inode_steal_slot to it.
   5) increase s_num_inodes_stolen.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
 fs/ocfs2/alloc.c      |    2 +
 fs/ocfs2/localalloc.c |    2 +
 fs/ocfs2/namei.c      |    2 +-
 fs/ocfs2/ocfs2.h      |   34 +++++++++++++++++++-
 fs/ocfs2/suballoc.c   |   80 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/ocfs2/super.c      |    1 +
 6 files changed, 116 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 447206e..f333cdc 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -4788,6 +4788,8 @@ static void ocfs2_truncate_log_worker(struct work_struct *work)
 	status = ocfs2_flush_truncate_log(osb);
 	if (status < 0)
 		mlog_errno(status);
+	else
+		ocfs2_init_inode_steal_slot(osb);
 
 	mlog_exit(status);
 }
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 80d1c75..ca3bf0c 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -450,6 +450,8 @@ out_mutex:
 	iput(main_bm_inode);
 
 out:
+	if (!status)
+		ocfs2_init_inode_steal_slot(osb);
 	mlog_exit(status);
 	return status;
 }
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index ae9ad95..ab5a227 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -424,7 +424,7 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
 	fe->i_fs_generation = cpu_to_le32(osb->fs_generation);
 	fe->i_blkno = cpu_to_le64(fe_blkno);
 	fe->i_suballoc_bit = cpu_to_le16(suballoc_bit);
-	fe->i_suballoc_slot = cpu_to_le16(osb->slot_num);
+	fe->i_suballoc_slot = cpu_to_le16(inode_ac->ac_alloc_slot);
 	fe->i_uid = cpu_to_le32(current->fsuid);
 	if (dir->i_mode & S_ISGID) {
 		fe->i_gid = cpu_to_le32(dir->i_gid);
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6546cef..0273a55 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -206,11 +206,14 @@ struct ocfs2_super
 	u32 s_feature_incompat;
 	u32 s_feature_ro_compat;
 
-	/* Protects s_next_generaion, osb_flags. Could protect more on
-	 * osb as it's very short lived. */
+	/* Protects s_next_generation, osb_flags and s_inode_steal_slot.
+	 * Could protect more on osb as it's very short lived.
+	 */
 	spinlock_t osb_lock;
 	u32 s_next_generation;
 	unsigned long osb_flags;
+	s16 s_inode_steal_slot;
+	atomic_t s_num_inodes_stolen;
 
 	unsigned long s_mount_opt;
 	unsigned int s_atime_quantum;
@@ -522,6 +525,33 @@ static inline unsigned int ocfs2_pages_per_cluster(struct super_block *sb)
 	return pages_per_cluster;
 }
 
+static inline void ocfs2_init_inode_steal_slot(struct ocfs2_super *osb)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = OCFS2_INVALID_SLOT;
+	spin_unlock(&osb->osb_lock);
+	atomic_set(&osb->s_num_inodes_stolen, 0);
+}
+
+static inline void ocfs2_set_inode_steal_slot(struct ocfs2_super *osb,
+					      s16 slot)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = slot;
+	spin_unlock(&osb->osb_lock);
+}
+
+static inline s16 ocfs2_get_inode_steal_slot(struct ocfs2_super *osb)
+{
+	s16 slot;
+
+	spin_lock(&osb->osb_lock);
+	slot = osb->s_inode_steal_slot;
+	spin_unlock(&osb->osb_lock);
+
+	return slot;
+}
+
 #define ocfs2_set_bit ext2_set_bit
 #define ocfs2_clear_bit ext2_clear_bit
 #define ocfs2_test_bit ext2_test_bit
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 33d5573..d2d278f 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -49,6 +49,8 @@
 #define NOT_ALLOC_NEW_GROUP		0
 #define ALLOC_NEW_GROUP			1
 
+#define OCFS2_MAX_INODES_TO_STEAL	1024
+
 static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg);
 static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe);
 static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl);
@@ -109,7 +111,7 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode,
 						u64 *bg_blkno,
 						u16 *bg_bit_off);
 
-void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+static void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac)
 {
 	struct inode *inode = ac->ac_inode;
 
@@ -120,9 +122,17 @@ void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
 		mutex_unlock(&inode->i_mutex);
 
 		iput(inode);
+		ac->ac_inode = NULL;
 	}
-	if (ac->ac_bh)
+	if (ac->ac_bh) {
 		brelse(ac->ac_bh);
+		ac->ac_bh = NULL;
+	}
+}
+
+void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+{
+	ocfs2_free_ac_resource(ac);
 	kfree(ac);
 }
 
@@ -522,10 +532,42 @@ bail:
 	return status;
 }
 
+static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
+					      struct ocfs2_alloc_context *ac)
+{
+	int i, status = -ENOSPC;
+	s16 slot = ocfs2_get_inode_steal_slot(osb);
+
+	/* Start to steal inodes from the first slot after ours. */
+	if (slot == OCFS2_INVALID_SLOT)
+		slot = osb->slot_num + 1;
+
+	for (i = 0; i < osb->max_slots; i++, slot++) {
+		if (slot == osb->max_slots)
+			slot = 0;
+
+		if (slot == osb->slot_num)
+			continue;
+
+		status = ocfs2_reserve_suballoc_bits(osb, ac,
+						     INODE_ALLOC_SYSTEM_INODE,
+						     slot, NOT_ALLOC_NEW_GROUP);
+		if (status >= 0) {
+			ocfs2_set_inode_steal_slot(osb, slot);
+			break;
+		}
+
+		ocfs2_free_ac_resource(ac);
+	}
+
+	return status;
+}
+
 int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 			    struct ocfs2_alloc_context **ac)
 {
 	int status;
+	s16 slot = ocfs2_get_inode_steal_slot(osb);
 
 	*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
 	if (!(*ac)) {
@@ -539,9 +581,43 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 
 	(*ac)->ac_group_search = ocfs2_block_group_search;
 
+	/*
+	 * slot is set when we successfully steal inode from other nodes.
+	 * It is reset in 3 places:
+	 * 1. when we flush the truncate log
+	 * 2. when we complete local alloc recovery.
+	 * 3. when we successfully allocate from our own slot.
+	 * After it is set, we will go on stealing inodes until we find the
+	 * need to check our slots to see whether there is some space for us.
+	 */
+	if (slot != OCFS2_INVALID_SLOT &&
+	    atomic_read(&osb->s_num_inodes_stolen) < OCFS2_MAX_INODES_TO_STEAL)
+		goto inode_steal;
+
+	atomic_set(&osb->s_num_inodes_stolen, 0);
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
 					     osb->slot_num, ALLOC_NEW_GROUP);
+	if (status >= 0) {
+		status = 0;
+
+		/*
+		 * Some inodes must be freed by us, so try to allocate
+		 * from our own next time.
+		 */
+		if (slot != OCFS2_INVALID_SLOT)
+			ocfs2_init_inode_steal_slot(osb);
+		goto bail;
+	} else if (status < 0 && status != -ENOSPC) {
+		mlog_errno(status);
+		goto bail;
+	}
+
+	ocfs2_free_ac_resource(*ac);
+
+inode_steal:
+	status = ocfs2_steal_inode_from_other_nodes(osb, *ac);
+	atomic_inc(&osb->s_num_inodes_stolen);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index bec75af..94c7a69 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1376,6 +1376,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	INIT_LIST_HEAD(&osb->blocked_lock_list);
 	osb->blocked_lock_count = 0;
 	spin_lock_init(&osb->osb_lock);
+	ocfs2_init_inode_steal_slot(osb);
 
 	atomic_set(&osb->alloc_stats.moves, 0);
 	atomic_set(&osb->alloc_stats.local_data, 0);
-- 
1.5.3.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-03-05  0:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-03  1:00 [Ocfs2-devel] [PATCH 0/3] Add inode stealing in ocfs2.V3 Tao Ma
2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 1/3] Add a new parameter for ocfs2_reserve_suballoc_bits.V3 Tao Ma
2008-03-03 18:32   ` Sunil Mushran
2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 2/3] Add ac_alloc_slot in ocfs2_alloc_context.V3 Tao Ma
2008-03-03 18:25   ` Sunil Mushran
2008-03-03  1:13 ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V3 Tao Ma
2008-03-03 18:25   ` Sunil Mushran
2008-03-03 18:47     ` Tao Ma
2008-03-03 18:57       ` Sunil Mushran
2008-03-03 19:21         ` Tao Ma
2008-03-04  0:55     ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V4 Tao Ma
2008-03-04 16:48       ` Mark Fasheh
2008-03-05  0:13         ` [Ocfs2-devel] [PATCH 3/3] Add inode stealing for ocfs2_reserve_new_inode.V5 Tao Ma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.