All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Four patches for ocfs2 1.4
@ 2008-07-07 20:26 Sunil Mushran
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: Silence an error message in ocfs2_file_aio_read() Sunil Mushran
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Sunil Mushran @ 2008-07-07 20:26 UTC (permalink / raw)
  To: ocfs2-devel

Four patches for review. The second patch is from the userspace
clusterstack changes that are already in mainline. It was backported
for applying the third patch that fixes a mount/reco hang.

Sunil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 1/4] ocfs2: Silence an error message in ocfs2_file_aio_read()
  2008-07-07 20:26 [Ocfs2-devel] Four patches for ocfs2 1.4 Sunil Mushran
@ 2008-07-07 20:26 ` Sunil Mushran
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: Change the recovery map to an array of node numbers Sunil Mushran
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Sunil Mushran @ 2008-07-07 20:26 UTC (permalink / raw)
  To: ocfs2-devel

This patch silences an EINVAL error message in ocfs2_file_aio_read()
that is always due to a user error.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/file.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index d4cf056..79c4ee1 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2450,7 +2450,7 @@ static ssize_t __ocfs2_file_aio_read(struct kiocb *iocb,
 
 	ret = kapi_generic_file_aio_read(iocb, iov, nr_segs, iocb->ki_pos);
 	if (ret == -EINVAL)
-		mlog(ML_ERROR, "generic_file_aio_read returned -EINVAL\n");
+		mlog(0, "generic_file_aio_read returned -EINVAL\n");
 
 	/* buffered aio wouldn't have proper lock coverage today */
 	BUG_ON(ret == -EIOCBQUEUED && !(filp->f_flags & O_DIRECT));
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 2/4] ocfs2: Change the recovery map to an array of node numbers.
  2008-07-07 20:26 [Ocfs2-devel] Four patches for ocfs2 1.4 Sunil Mushran
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: Silence an error message in ocfs2_file_aio_read() Sunil Mushran
@ 2008-07-07 20:26 ` Sunil Mushran
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery Sunil Mushran
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 4/4] ocfs2/dlm: Fixes oops in dlm_new_lockres() Sunil Mushran
  3 siblings, 0 replies; 11+ messages in thread
From: Sunil Mushran @ 2008-07-07 20:26 UTC (permalink / raw)
  To: ocfs2-devel

From: Joel Becker <joel.becker@oracle.com>

Mainline commit 553abd046af609191a91af7289d87d477adc659f

The old recovery map was a bitmap of node numbers.  This was sufficient
for the maximum node number of 254.  Going forward, we want node numbers
to be UINT32.  Thus, we need a new recovery map.

Note that we can't keep track of slots here.  We must write down the
node number to recovery *before* we get the locks needed to convert a
node number into a slot number.

The recovery map is now an array of unsigned ints, max_slots in size.
It moves to journal.c with the rest of recovery.

Because it needs to be initialized, we move all of recovery initialization
into a new function, ocfs2_recovery_init().  This actually cleans up
ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
becomes part of ocfs2_recovery_exit().

A number of node map functions are rendered obsolete and are removed.

Finally, waiting on recovery is wrapped in a function rather than naked
checks on the recovery_event.  This is a cleanup from Mark.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |    6 +-
 fs/ocfs2/heartbeat.c |  111 ------------------------------
 fs/ocfs2/heartbeat.h |   14 ----
 fs/ocfs2/journal.c   |  181 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/ocfs2/journal.h   |    4 +
 fs/ocfs2/ocfs2.h     |    3 +-
 fs/ocfs2/super.c     |   33 ++-------
 7 files changed, 182 insertions(+), 170 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index d7ae5cd..adb1d9d 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1950,8 +1950,7 @@ int ocfs2_inode_lock_full(struct inode *inode,
 		goto local;
 
 	if (!(arg_flags & OCFS2_META_LOCK_RECOVERY))
-		wait_event(osb->recovery_event,
-			   ocfs2_node_map_is_empty(osb, &osb->recovery_map));
+		ocfs2_wait_for_recovery(osb);
 
 	lockres = &OCFS2_I(inode)->ip_inode_lockres;
 	level = ex ? LKM_EXMODE : LKM_PRMODE;
@@ -1974,8 +1973,7 @@ int ocfs2_inode_lock_full(struct inode *inode,
 	 * committed to owning this lock so we don't allow signals to
 	 * abort the operation. */
 	if (!(arg_flags & OCFS2_META_LOCK_RECOVERY))
-		wait_event(osb->recovery_event,
-			   ocfs2_node_map_is_empty(osb, &osb->recovery_map));
+		ocfs2_wait_for_recovery(osb);
 
 local:
 	/*
diff --git a/fs/ocfs2/heartbeat.c b/fs/ocfs2/heartbeat.c
index 0758daf..80de239 100644
--- a/fs/ocfs2/heartbeat.c
+++ b/fs/ocfs2/heartbeat.c
@@ -48,7 +48,6 @@ static inline void __ocfs2_node_map_set_bit(struct ocfs2_node_map *map,
 					    int bit);
 static inline void __ocfs2_node_map_clear_bit(struct ocfs2_node_map *map,
 					      int bit);
-static inline int __ocfs2_node_map_is_empty(struct ocfs2_node_map *map);
 
 /* special case -1 for now
  * TODO: should *really* make sure the calling func never passes -1!!  */
@@ -62,7 +61,6 @@ static void ocfs2_node_map_init(struct ocfs2_node_map *map)
 void ocfs2_init_node_maps(struct ocfs2_super *osb)
 {
 	spin_lock_init(&osb->node_map_lock);
-	ocfs2_node_map_init(&osb->recovery_map);
 	ocfs2_node_map_init(&osb->osb_recovering_orphan_dirs);
 }
 
@@ -192,112 +190,3 @@ int ocfs2_node_map_test_bit(struct ocfs2_super *osb,
 	return ret;
 }
 
-static inline int __ocfs2_node_map_is_empty(struct ocfs2_node_map *map)
-{
-	int bit;
-	bit = find_next_bit(map->map, map->num_nodes, 0);
-	if (bit < map->num_nodes)
-		return 0;
-	return 1;
-}
-
-int ocfs2_node_map_is_empty(struct ocfs2_super *osb,
-			    struct ocfs2_node_map *map)
-{
-	int ret;
-	BUG_ON(map->num_nodes == 0);
-	spin_lock(&osb->node_map_lock);
-	ret = __ocfs2_node_map_is_empty(map);
-	spin_unlock(&osb->node_map_lock);
-	return ret;
-}
-
-#if 0
-
-static void __ocfs2_node_map_dup(struct ocfs2_node_map *target,
-				 struct ocfs2_node_map *from)
-{
-	BUG_ON(from->num_nodes == 0);
-	ocfs2_node_map_init(target);
-	__ocfs2_node_map_set(target, from);
-}
-
-/* returns 1 if bit is the only bit set in target, 0 otherwise */
-int ocfs2_node_map_is_only(struct ocfs2_super *osb,
-			   struct ocfs2_node_map *target,
-			   int bit)
-{
-	struct ocfs2_node_map temp;
-	int ret;
-
-	spin_lock(&osb->node_map_lock);
-	__ocfs2_node_map_dup(&temp, target);
-	__ocfs2_node_map_clear_bit(&temp, bit);
-	ret = __ocfs2_node_map_is_empty(&temp);
-	spin_unlock(&osb->node_map_lock);
-
-	return ret;
-}
-
-static void __ocfs2_node_map_set(struct ocfs2_node_map *target,
-				 struct ocfs2_node_map *from)
-{
-	int num_longs, i;
-
-	BUG_ON(target->num_nodes != from->num_nodes);
-	BUG_ON(target->num_nodes == 0);
-
-	num_longs = BITS_TO_LONGS(target->num_nodes);
-	for (i = 0; i < num_longs; i++)
-		target->map[i] = from->map[i];
-}
-
-#endif  /*  0  */
-
-/* Returns whether the recovery bit was actually set - it may not be
- * if a node is still marked as needing recovery */
-int ocfs2_recovery_map_set(struct ocfs2_super *osb,
-			   int num)
-{
-	int set = 0;
-
-	spin_lock(&osb->node_map_lock);
-
-	if (!test_bit(num, osb->recovery_map.map)) {
-	    __ocfs2_node_map_set_bit(&osb->recovery_map, num);
-	    set = 1;
-	}
-
-	spin_unlock(&osb->node_map_lock);
-
-	return set;
-}
-
-void ocfs2_recovery_map_clear(struct ocfs2_super *osb,
-			      int num)
-{
-	ocfs2_node_map_clear_bit(osb, &osb->recovery_map, num);
-}
-
-int ocfs2_node_map_iterate(struct ocfs2_super *osb,
-			   struct ocfs2_node_map *map,
-			   int idx)
-{
-	int i = idx;
-
-	idx = O2NM_INVALID_NODE_NUM;
-	spin_lock(&osb->node_map_lock);
-	if ((i != O2NM_INVALID_NODE_NUM) &&
-	    (i >= 0) &&
-	    (i < map->num_nodes)) {
-		while(i < map->num_nodes) {
-			if (test_bit(i, map->map)) {
-				idx = i;
-				break;
-			}
-			i++;
-		}
-	}
-	spin_unlock(&osb->node_map_lock);
-	return idx;
-}
diff --git a/fs/ocfs2/heartbeat.h b/fs/ocfs2/heartbeat.h
index eac63ae..98d8ffc 100644
--- a/fs/ocfs2/heartbeat.h
+++ b/fs/ocfs2/heartbeat.h
@@ -33,8 +33,6 @@ void ocfs2_stop_heartbeat(struct ocfs2_super *osb);
 
 /* node map functions - used to keep track of mounted and in-recovery
  * nodes. */
-int ocfs2_node_map_is_empty(struct ocfs2_super *osb,
-			    struct ocfs2_node_map *map);
 void ocfs2_node_map_set_bit(struct ocfs2_super *osb,
 			    struct ocfs2_node_map *map,
 			    int bit);
@@ -44,17 +42,5 @@ void ocfs2_node_map_clear_bit(struct ocfs2_super *osb,
 int ocfs2_node_map_test_bit(struct ocfs2_super *osb,
 			    struct ocfs2_node_map *map,
 			    int bit);
-int ocfs2_node_map_iterate(struct ocfs2_super *osb,
-			   struct ocfs2_node_map *map,
-			   int idx);
-static inline int ocfs2_node_map_first_set_bit(struct ocfs2_super *osb,
-					       struct ocfs2_node_map *map)
-{
-	return ocfs2_node_map_iterate(osb, map, 0);
-}
-int ocfs2_recovery_map_set(struct ocfs2_super *osb,
-			   int num);
-void ocfs2_recovery_map_clear(struct ocfs2_super *osb,
-			      int num);
 
 #endif /* OCFS2_HEARTBEAT_H */
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 741a4e2..c39eb12 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -64,6 +64,137 @@ static int ocfs2_recover_orphans(struct ocfs2_super *osb,
 				 int slot);
 static int ocfs2_commit_thread(void *arg);
 
+
+/*
+ * The recovery_list is a simple linked list of node numbers to recover.
+ * It is protected by the recovery_lock.
+ */
+
+struct ocfs2_recovery_map {
+	int rm_used;
+	unsigned int *rm_entries;
+};
+
+int ocfs2_recovery_init(struct ocfs2_super *osb)
+{
+	struct ocfs2_recovery_map *rm;
+
+	mutex_init(&osb->recovery_lock);
+	osb->disable_recovery = 0;
+	osb->recovery_thread_task = NULL;
+	init_waitqueue_head(&osb->recovery_event);
+
+	rm = kzalloc(sizeof(struct ocfs2_recovery_map) +
+		     osb->max_slots * sizeof(unsigned int),
+		     GFP_KERNEL);
+	if (!rm) {
+		mlog_errno(-ENOMEM);
+		return -ENOMEM;
+	}
+
+	rm->rm_entries = (unsigned int *)((char *)rm +
+					  sizeof(struct ocfs2_recovery_map));
+	osb->recovery_map = rm;
+
+	return 0;
+}
+
+/* we can't grab the goofy sem lock from inside wait_event, so we use
+ * memory barriers to make sure that we'll see the null task before
+ * being woken up */
+static int ocfs2_recovery_thread_running(struct ocfs2_super *osb)
+{
+	mb();
+	return osb->recovery_thread_task != NULL;
+}
+
+void ocfs2_recovery_exit(struct ocfs2_super *osb)
+{
+	struct ocfs2_recovery_map *rm;
+
+	/* disable any new recovery threads and wait for any currently
+	 * running ones to exit. Do this before setting the vol_state. */
+	mutex_lock(&osb->recovery_lock);
+	osb->disable_recovery = 1;
+	mutex_unlock(&osb->recovery_lock);
+	wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb));
+
+	/* At this point, we know that no more recovery threads can be
+	 * launched, so wait for any recovery completion work to
+	 * complete. */
+	flush_workqueue(ocfs2_wq);
+
+	/*
+	 * Now that recovery is shut down, and the osb is about to be
+	 * freed,  the osb_lock is not taken here.
+	 */
+	rm = osb->recovery_map;
+	/* XXX: Should we bug if there are dirty entries? */
+
+	kfree(rm);
+}
+
+static int __ocfs2_recovery_map_test(struct ocfs2_super *osb,
+				     unsigned int node_num)
+{
+	int i;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	assert_spin_locked(&osb->osb_lock);
+
+	for (i = 0; i < rm->rm_used; i++) {
+		if (rm->rm_entries[i] == node_num)
+			return 1;
+	}
+
+	return 0;
+}
+
+/* Behaves like test-and-set.  Returns the previous value */
+static int ocfs2_recovery_map_set(struct ocfs2_super *osb,
+				  unsigned int node_num)
+{
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	spin_lock(&osb->osb_lock);
+	if (__ocfs2_recovery_map_test(osb, node_num)) {
+		spin_unlock(&osb->osb_lock);
+		return 1;
+	}
+
+	/* XXX: Can this be exploited? Not from o2dlm... */
+	BUG_ON(rm->rm_used >= osb->max_slots);
+
+	rm->rm_entries[rm->rm_used] = node_num;
+	rm->rm_used++;
+	spin_unlock(&osb->osb_lock);
+
+	return 0;
+}
+
+static void ocfs2_recovery_map_clear(struct ocfs2_super *osb,
+				     unsigned int node_num)
+{
+	int i;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	spin_lock(&osb->osb_lock);
+
+	for (i = 0; i < rm->rm_used; i++) {
+		if (rm->rm_entries[i] == node_num)
+			break;
+	}
+
+	if (i < rm->rm_used) {
+		/* XXX: be careful with the pointer math */
+		memmove(&(rm->rm_entries[i]), &(rm->rm_entries[i + 1]),
+			(rm->rm_used - i - 1) * sizeof(unsigned int));
+		rm->rm_used--;
+	}
+
+	spin_unlock(&osb->osb_lock);
+}
+
 static int ocfs2_commit_cache(struct ocfs2_super *osb)
 {
 	int status = 0;
@@ -654,6 +785,23 @@ bail:
 	return status;
 }
 
+static int ocfs2_recovery_completed(struct ocfs2_super *osb)
+{
+	int empty;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	spin_lock(&osb->osb_lock);
+	empty = (rm->rm_used == 0);
+	spin_unlock(&osb->osb_lock);
+
+	return empty;
+}
+
+void ocfs2_wait_for_recovery(struct ocfs2_super *osb)
+{
+	wait_event(osb->recovery_event, ocfs2_recovery_completed(osb));
+}
+
 /*
  * JBD Might read a cached version of another nodes journal file. We
  * don't want this as this file changes often and we get no
@@ -852,6 +1000,7 @@ static int __ocfs2_recovery_thread(void *arg)
 {
 	int status, node_num;
 	struct ocfs2_super *osb = arg;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
 
 	mlog_entry_void();
 
@@ -867,26 +1016,29 @@ restart:
 		goto bail;
 	}
 
-	while(!ocfs2_node_map_is_empty(osb, &osb->recovery_map)) {
-		node_num = ocfs2_node_map_first_set_bit(osb,
-							&osb->recovery_map);
-		if (node_num == O2NM_INVALID_NODE_NUM) {
-			mlog(0, "Out of nodes to recover.\n");
-			break;
-		}
+	spin_lock(&osb->osb_lock);
+	while (rm->rm_used) {
+		/* It's always safe to remove entry zero, as we won't
+		 * clear it until ocfs2_recover_node() has succeeded. */
+		node_num = rm->rm_entries[0];
+		spin_unlock(&osb->osb_lock);
 
 		status = ocfs2_recover_node(osb, node_num);
-		if (status < 0) {
+		if (!status) {
+			ocfs2_recovery_map_clear(osb, node_num);
+		} else {
 			mlog(ML_ERROR,
 			     "Error %d recovering node %d on device (%u,%u)!\n",
 			     status, node_num,
 			     MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev));
 			mlog(ML_ERROR, "Volume requires unmount.\n");
-			continue;
 		}
 
-		ocfs2_recovery_map_clear(osb, node_num);
+		spin_lock(&osb->osb_lock);
 	}
+	spin_unlock(&osb->osb_lock);
+	mlog(0, "All nodes recovered\n");
+
 	ocfs2_super_unlock(osb, 1);
 
 	/* We always run recovery on our own orphan dir - the dead
@@ -897,8 +1049,7 @@ restart:
 
 bail:
 	mutex_lock(&osb->recovery_lock);
-	if (!status &&
-	    !ocfs2_node_map_is_empty(osb, &osb->recovery_map)) {
+	if (!status && !ocfs2_recovery_completed(osb)) {
 		mutex_unlock(&osb->recovery_lock);
 		goto restart;
 	}
@@ -928,8 +1079,8 @@ void ocfs2_recovery_thread(struct ocfs2_super *osb, int node_num)
 
 	/* People waiting on recovery will wait on
 	 * the recovery map to empty. */
-	if (!ocfs2_recovery_map_set(osb, node_num))
-		mlog(0, "node %d already be in recovery.\n", node_num);
+	if (ocfs2_recovery_map_set(osb, node_num))
+		mlog(0, "node %d already in recovery map.\n", node_num);
 
 	mlog(0, "starting recovery thread...\n");
 
@@ -1202,7 +1353,7 @@ int ocfs2_mark_dead_nodes(struct ocfs2_super *osb)
 			continue;
 
 		node_num = si->si_global_node_nums[i];
-		if (ocfs2_node_map_test_bit(osb, &osb->recovery_map, node_num))
+		if (__ocfs2_recovery_map_test(osb, node_num))
 			continue;
 		spin_unlock(&si->si_lock);
 
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 52f02fe..986f71f 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -134,6 +134,10 @@ static inline void ocfs2_inode_set_new(struct ocfs2_super *osb,
 
 /* Exported only for the journal struct init code in super.c. Do not call. */
 void ocfs2_complete_recovery(kapi_work_struct_t *work);
+void ocfs2_wait_for_recovery(struct ocfs2_super *osb);
+
+int ocfs2_recovery_init(struct ocfs2_super *osb);
+void ocfs2_recovery_exit(struct ocfs2_super *osb);
 
 /*
  *  Journal Control:
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 9546470..5214c00 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -183,6 +183,7 @@ enum ocfs2_mount_options
 #define OCFS2_DEFAULT_ATIME_QUANTUM	60
 
 struct ocfs2_journal;
+struct ocfs2_recovery_map;
 struct ocfs2_super
 {
 	struct task_struct *commit_task;
@@ -194,7 +195,6 @@ struct ocfs2_super
 	struct ocfs2_slot_info *slot_info;
 
 	spinlock_t node_map_lock;
-	struct ocfs2_node_map recovery_map;
 
 	u64 root_blkno;
 	u64 system_dir_blkno;
@@ -232,6 +232,7 @@ struct ocfs2_super
 
 	atomic_t vol_state;
 	struct mutex recovery_lock;
+	struct ocfs2_recovery_map *recovery_map;
 	struct task_struct *recovery_thread_task;
 	int disable_recovery;
 	wait_queue_head_t checkpoint_event;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index ff31722..fcb7502 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1296,15 +1296,6 @@ leave:
 	return status;
 }
 
-/* we can't grab the goofy sem lock from inside wait_event, so we use
- * memory barriers to make sure that we'll see the null task before
- * being woken up */
-static int ocfs2_recovery_thread_running(struct ocfs2_super *osb)
-{
-	mb();
-	return osb->recovery_thread_task != NULL;
-}
-
 static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 {
 	int tmp;
@@ -1321,17 +1312,8 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 
 	ocfs2_truncate_log_shutdown(osb);
 
-	/* disable any new recovery threads and wait for any currently
-	 * running ones to exit. Do this before setting the vol_state. */
-	mutex_lock(&osb->recovery_lock);
-	osb->disable_recovery = 1;
-	mutex_unlock(&osb->recovery_lock);
-	wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb));
-
-	/* At this point, we know that no more recovery threads can be
-	 * launched, so wait for any recovery completion work to
-	 * complete. */
-	flush_workqueue(ocfs2_wq);
+	/* This will disable recovery and flush any recovery work. */
+	ocfs2_recovery_exit(osb);
 
 	ocfs2_journal_shutdown(osb);
 
@@ -1440,7 +1422,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	osb->s_sectsize_bits = blksize_bits(sector_size);
 	BUG_ON(!osb->s_sectsize_bits);
 
-	init_waitqueue_head(&osb->recovery_event);
 	spin_lock_init(&osb->dc_task_lock);
 	init_waitqueue_head(&osb->dc_event);
 	osb->dc_work_sequence = 0;
@@ -1461,10 +1442,12 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	snprintf(osb->dev_str, sizeof(osb->dev_str), "%u,%u",
 		 MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev));
 
-	mutex_init(&osb->recovery_lock);
-
-	osb->disable_recovery = 0;
-	osb->recovery_thread_task = NULL;
+	status = ocfs2_recovery_init(osb);
+	if (status) {
+		mlog(ML_ERROR, "Unable to initialize recovery state\n");
+		mlog_errno(status);
+		goto bail;
+	}
 
 	init_waitqueue_head(&osb->checkpoint_event);
 	atomic_set(&osb->needs_checkpoint, 0);
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery
  2008-07-07 20:26 [Ocfs2-devel] Four patches for ocfs2 1.4 Sunil Mushran
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: Silence an error message in ocfs2_file_aio_read() Sunil Mushran
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: Change the recovery map to an array of node numbers Sunil Mushran
@ 2008-07-07 20:26 ` Sunil Mushran
  2008-07-07 20:50   ` Joel Becker
  2008-07-07 20:51   ` Joel Becker
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 4/4] ocfs2/dlm: Fixes oops in dlm_new_lockres() Sunil Mushran
  3 siblings, 2 replies; 11+ messages in thread
From: Sunil Mushran @ 2008-07-07 20:26 UTC (permalink / raw)
  To: ocfs2-devel

This patch fixes a tiny race between recovery and mount that could otherwise
lead to a hang.

During mount, the node takes an EX on the superblock and the journal/slot locks,
and, if needed, recovers the slot it is assigned. However, it only marks the
other dirty slots and leaves their recovery to the recovery thread that does so
after the mount process has completed.

The last step of the mount process calls for it to drop the EX on the superblock
lock. The node holds onto the same lock level on its journal/slot lock.

The recovery thread then picks up the EX on the superblock lock and then takes
the same lock for the journal/slot it is recovering.

This exposes a tiny race as another node mounting the volume could race the
recovery thread for the EX on the superblock lock.

If that happens, that node could be assigned a dirty slot which it would recover.
It too will then drop the EX on the superblock lock but hold onto the same
lock level on its journal/slot lock.

Now when the recovery thread on the first node gets back the EX on the superblock
lock, it will promptly attempt to get the EX on the journal/slot lock of the node
it thinks is dirty but since has not only been recovered but also locked by
another node.

Hang.

The fix here is to make the journal/slot EX lock request during recovery a trylock
operation. If the trylock fails, it would indicate that that slot is in use and
thus recovered.

This bug appears to have been inadvertently introduced during the mount/umount
vote removal by mainline commit 34d024f84345807bf44163fac84e921513dde323. In the
older voting scheme, the voting thread would remove the node being mounted from
the recovery map.

Mark Fasheh <mfasheh@suse.com> wrote the initial draft of this patch fix.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/journal.c |   80 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 64 insertions(+), 16 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index c39eb12..3b14d1f 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -28,6 +28,7 @@
 #include <linux/slab.h>
 #include <linux/highmem.h>
 #include <linux/kthread.h>
+#include <linux/delay.h>
 
 #define MLOG_MASK_PREFIX ML_JOURNAL
 #include <cluster/masklog.h>
@@ -70,9 +71,14 @@ static int ocfs2_commit_thread(void *arg);
  * It is protected by the recovery_lock.
  */
 
+struct ocfs2_recovery_item {
+	unsigned int ri_node_num;
+	unsigned int ri_count;
+};
+
 struct ocfs2_recovery_map {
 	int rm_used;
-	unsigned int *rm_entries;
+	struct ocfs2_recovery_item *rm_entries;
 };
 
 int ocfs2_recovery_init(struct ocfs2_super *osb)
@@ -85,15 +91,15 @@ int ocfs2_recovery_init(struct ocfs2_super *osb)
 	init_waitqueue_head(&osb->recovery_event);
 
 	rm = kzalloc(sizeof(struct ocfs2_recovery_map) +
-		     osb->max_slots * sizeof(unsigned int),
+		     osb->max_slots * sizeof(struct ocfs2_recovery_item),
 		     GFP_KERNEL);
 	if (!rm) {
 		mlog_errno(-ENOMEM);
 		return -ENOMEM;
 	}
 
-	rm->rm_entries = (unsigned int *)((char *)rm +
-					  sizeof(struct ocfs2_recovery_map));
+	rm->rm_entries = (struct ocfs2_recovery_item *)
+			((char *)rm + sizeof(struct ocfs2_recovery_map));
 	osb->recovery_map = rm;
 
 	return 0;
@@ -143,7 +149,7 @@ static int __ocfs2_recovery_map_test(struct ocfs2_super *osb,
 	assert_spin_locked(&osb->osb_lock);
 
 	for (i = 0; i < rm->rm_used; i++) {
-		if (rm->rm_entries[i] == node_num)
+		if (rm->rm_entries[i].ri_node_num == node_num)
 			return 1;
 	}
 
@@ -155,6 +161,7 @@ static int ocfs2_recovery_map_set(struct ocfs2_super *osb,
 				  unsigned int node_num)
 {
 	struct ocfs2_recovery_map *rm = osb->recovery_map;
+	int i;
 
 	spin_lock(&osb->osb_lock);
 	if (__ocfs2_recovery_map_test(osb, node_num)) {
@@ -165,8 +172,17 @@ static int ocfs2_recovery_map_set(struct ocfs2_super *osb,
 	/* XXX: Can this be exploited? Not from o2dlm... */
 	BUG_ON(rm->rm_used >= osb->max_slots);
 
-	rm->rm_entries[rm->rm_used] = node_num;
+	for (i = 0; i < rm->rm_used; i++) {
+		if (rm->rm_entries[i].ri_node_num == node_num) {
+			rm->rm_entries[i].ri_count++;
+			goto out;
+		}
+	}
+
+	rm->rm_entries[rm->rm_used].ri_node_num = node_num;
+	rm->rm_entries[rm->rm_used].ri_count = 1;
 	rm->rm_used++;
+out:
 	spin_unlock(&osb->osb_lock);
 
 	return 0;
@@ -181,17 +197,21 @@ static void ocfs2_recovery_map_clear(struct ocfs2_super *osb,
 	spin_lock(&osb->osb_lock);
 
 	for (i = 0; i < rm->rm_used; i++) {
-		if (rm->rm_entries[i] == node_num)
+		if (rm->rm_entries[i].ri_node_num == node_num) {
+			rm->rm_entries[i].ri_count--;
+			if (rm->rm_entries[i].ri_count)
+				goto out;
 			break;
+		}
 	}
 
 	if (i < rm->rm_used) {
 		/* XXX: be careful with the pointer math */
 		memmove(&(rm->rm_entries[i]), &(rm->rm_entries[i + 1]),
-			(rm->rm_used - i - 1) * sizeof(unsigned int));
+			(rm->rm_used - i - 1) * sizeof(struct ocfs2_recovery_item));
 		rm->rm_used--;
 	}
-
+out:
 	spin_unlock(&osb->osb_lock);
 }
 
@@ -1020,7 +1040,7 @@ restart:
 	while (rm->rm_used) {
 		/* It's always safe to remove entry zero, as we won't
 		 * clear it until ocfs2_recover_node() has succeeded. */
-		node_num = rm->rm_entries[0];
+		node_num = rm->rm_entries[0].ri_node_num;
 		spin_unlock(&osb->osb_lock);
 
 		status = ocfs2_recover_node(osb, node_num);
@@ -1105,7 +1125,8 @@ out:
  * clean. Will only replay if the journal inode is marked dirty. */
 static int ocfs2_replay_journal(struct ocfs2_super *osb,
 				int node_num,
-				int slot_num)
+				int slot_num,
+				int *busy)
 {
 	int status;
 	int got_lock = 0;
@@ -1115,6 +1136,8 @@ static int ocfs2_replay_journal(struct ocfs2_super *osb,
 	journal_t *journal = NULL;
 	struct buffer_head *bh = NULL;
 
+	*busy = 0;
+
 	inode = ocfs2_get_system_file_inode(osb, JOURNAL_SYSTEM_INODE,
 					    slot_num);
 	if (inode == NULL) {
@@ -1131,8 +1154,23 @@ static int ocfs2_replay_journal(struct ocfs2_super *osb,
 	}
 	SET_INODE_JOURNAL(inode);
 
-	status = ocfs2_inode_lock_full(inode, &bh, 1, OCFS2_META_LOCK_RECOVERY);
+	status = ocfs2_inode_lock_full(inode, &bh, 1,
+		       OCFS2_META_LOCK_RECOVERY|OCFS2_META_LOCK_NOQUEUE);
 	if (status < 0) {
+		/*
+		 * As fs recovery is asynchronous, there is a small possibility
+		 * that the slot being recovered has been assigned to another
+		 * node that happened to be mounting the volume@that same
+		 * time. If so, the journal replay will be handled by that node.
+		 * A failed trylock on the journal indicates that the slot has
+		 * been assigned to another node.
+		 */
+		if (status == -EAGAIN) {
+			*busy = 1;
+			status = 0;
+			goto done;
+		}
+
 		mlog(0, "status returned from ocfs2_inode_lock=%d\n", status);
 		if (status != -ERESTARTSYS)
 			mlog(ML_ERROR, "Could not lock journal!\n");
@@ -1232,7 +1270,7 @@ done:
 static int ocfs2_recover_node(struct ocfs2_super *osb,
 			      int node_num)
 {
-	int status = 0;
+	int status = 0, busy = 0;
 	int slot_num;
 	struct ocfs2_slot_info *si = osb->slot_info;
 	struct ocfs2_dinode *la_copy = NULL;
@@ -1256,11 +1294,16 @@ static int ocfs2_recover_node(struct ocfs2_super *osb,
 
 	mlog(0, "node %d was using slot %d\n", node_num, slot_num);
 
-	status = ocfs2_replay_journal(osb, node_num, slot_num);
+	status = ocfs2_replay_journal(osb, node_num, slot_num, &busy);
 	if (status < 0) {
 		mlog_errno(status);
 		goto done;
 	}
+	if (busy) {
+		mlog(0, "Skipping recovery for slot %u (node %u) "
+		     "as another node has recovered it\n", slot_num, node_num);
+		goto done;
+	}
 
 	/* Stamp a clean local alloc file AFTER recovering the journal... */
 	status = ocfs2_begin_local_alloc_recovery(osb, slot_num, &la_copy);
@@ -1339,7 +1382,7 @@ bail:
  * slot info struct has been updated from disk. */
 int ocfs2_mark_dead_nodes(struct ocfs2_super *osb)
 {
-	int status, i, node_num;
+	int status, i, node_num, inreco;
 	struct ocfs2_slot_info *si = osb->slot_info;
 
 	/* This is called with the super block cluster lock, so we
@@ -1353,8 +1396,13 @@ int ocfs2_mark_dead_nodes(struct ocfs2_super *osb)
 			continue;
 
 		node_num = si->si_global_node_nums[i];
-		if (__ocfs2_recovery_map_test(osb, node_num))
+
+		spin_lock(&osb->osb_lock);
+		inreco = __ocfs2_recovery_map_test(osb, node_num);
+		spin_unlock(&osb->osb_lock);
+		if (inreco)
 			continue;
+
 		spin_unlock(&si->si_lock);
 
 		/* Ok, we have a slot occupied by another node which
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 4/4] ocfs2/dlm: Fixes oops in dlm_new_lockres()
  2008-07-07 20:26 [Ocfs2-devel] Four patches for ocfs2 1.4 Sunil Mushran
                   ` (2 preceding siblings ...)
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery Sunil Mushran
@ 2008-07-07 20:26 ` Sunil Mushran
  3 siblings, 0 replies; 11+ messages in thread
From: Sunil Mushran @ 2008-07-07 20:26 UTC (permalink / raw)
  To: ocfs2-devel

Patch fixes a race that can result in an oops while adding a
lockres to the dlm lockres tracking list.

Bug introduced by mainline commit 29576f8bb54045be944ba809d4fca1ad77c94165.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
 fs/ocfs2/dlm/dlmmaster.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 24fb8e4..9d3472e 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -606,7 +606,9 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm,
 
 	res->last_used = 0;
 
+	spin_lock(&dlm->spinlock);
 	list_add_tail(&res->tracking, &dlm->tracking_list);
+	spin_unlock(&dlm->spinlock);
 
 	memset(res->lvb, 0, DLM_LVB_LEN);
 	memset(res->refmap, 0, sizeof(res->refmap));
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery Sunil Mushran
@ 2008-07-07 20:50   ` Joel Becker
       [not found]     ` <4872C7A4.1010507@oracle.com>
  2008-07-07 20:51   ` Joel Becker
  1 sibling, 1 reply; 11+ messages in thread
From: Joel Becker @ 2008-07-07 20:50 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Jul 07, 2008 at 01:26:19PM -0700, Sunil Mushran wrote:
> This patch fixes a tiny race between recovery and mount that could otherwise
> lead to a hang.

	First off, does this still exist in mainline?

<snip>
 
> The last step of the mount process calls for it to drop the EX on the superblock
> lock. The node holds onto the same lock level on its journal/slot lock.
> 
> The recovery thread then picks up the EX on the superblock lock and then takes
> the same lock for the journal/slot it is recovering.
> 
> This exposes a tiny race as another node mounting the volume could race the
> recovery thread for the EX on the superblock lock.
> 
> If that happens, that node could be assigned a dirty slot which it would recover.
> It too will then drop the EX on the superblock lock but hold onto the same
> lock level on its journal/slot lock.
> 
> Now when the recovery thread on the first node gets back the EX on the superblock
> lock, it will promptly attempt to get the EX on the journal/slot lock of the node
> it thinks is dirty but since has not only been recovered but also locked by
> another node.
> 
> Hang.
> 
> The fix here is to make the journal/slot EX lock request during recovery a trylock
> operation. If the trylock fails, it would indicate that that slot is in use and
> thus recovered.

	How do we distinguish this from a node that hasn't been evicted
yet?  If you recall, we first tried to eliminate the recovery map
completely, having recovery just trylock every journal to find the dead
node.  But if the node hasn't been evicted, the trylock fails.  Isn't
this the same thing?
	Let me try putting it another way.  The recovery thread just
sees entries.  It doesn't know whether they came from mount or from a
dead node event.  So, we have an entry.  Sure, it could be an entry that
was locked during mount, then unlocked, and now we're trying to relock.
The only way the trylock fails is if someone else recovered it.  That's
safe.  If it came from another path, though, we expect to sleep on that
lock - that's how we know the dlm has evicted it, because the lock
blocks until then.  How do we determine this is the case and sleep on
it?  Otherwise, our trylock fails, we skip it, then the dlm evicts it,
and noone has recovered it.  We then continue assuming the journal was
replayed, which it was not.
	I'm missing something, right?

Joel

-- 

"Anything that is too stupid to be spoken is sung."  
        - Voltaire

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery
  2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery Sunil Mushran
  2008-07-07 20:50   ` Joel Becker
@ 2008-07-07 20:51   ` Joel Becker
  1 sibling, 0 replies; 11+ messages in thread
From: Joel Becker @ 2008-07-07 20:51 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Jul 07, 2008 at 01:26:19PM -0700, Sunil Mushran wrote:
> +struct ocfs2_recovery_item {
> +	unsigned int ri_node_num;
> +	unsigned int ri_count;
> +};

	Also, what's the point of the count here?  To try and recover
more than once?

Joel

-- 

 Brain: I shall pollute the water supply with this DNAdefibuliser,
        turning everyone into mindless slaves.
 Pinky: What about the people who drink bottled water?
 Brain: Pinky, people who pay 5 dollars for a bottle of water are
        already mindless slaves.

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery
       [not found]     ` <4872C7A4.1010507@oracle.com>
@ 2008-07-08 17:08       ` Sunil Mushran
  2008-07-08 18:17         ` Joel Becker
  0 siblings, 1 reply; 11+ messages in thread
From: Sunil Mushran @ 2008-07-08 17:08 UTC (permalink / raw)
  To: ocfs2-devel

Instead of using generation, we could use ij_pad. Renamed to
something appropriate.

Sunil Mushran wrote:
> ok.. going back to the drawing board. I am putting in writing the
> scheme the two of us thrashed out here. Please review.
>
> Problem:
> How does a recovery thread know that the slot it is trying to recover
> has already been mounted by another node (and thus already recovered)?
>
> Our first solution was to trylock the journal lock with the assumption
> that if it failed, the slot was recovered by another node.
>
> As Joel pointed out, the problem here is that as the fs is informed
> of node death before the dlm, a trylock on journal failing could
> be a case of dlm not yet registering the node death. And skipping
> journal replay in these circumstances would be asking for trouble.
> So we have to make a blocking request... but if we do that blindly,
> we could hang as another node could have mounted that slot.
>
> Possible Solution:
> Use the i_generation in the journal inode as a recovery counter.
> Meaning, we increment it each time we recover the journal. That
> is, from mount thread in ocfs2_journal_load() and from recovery
> thread in ocfs2_replay_journal(). In both places, we have the
> superblock lock making the writes safe.
>
> The mount thread will read the generation#s and save it. In the
> recovery thread, after taking the superblock lock, the thread
> reads the journal inode. If it finds the generation is not the same,
> it can safely skip the recovery. If not, it will go ahead with the
> blocking lock.
>
> As all nodes get the down event, they all go thru with the recovery
> process if only to detect that no recovery is required. We will
> use that to keep the recovery generation for all journal slots uptodate.
>
> Sunil
>
> Joel Becker wrote:
>> On Mon, Jul 07, 2008 at 01:26:19PM -0700, Sunil Mushran wrote:
>>  
>>> This patch fixes a tiny race between recovery and mount that could 
>>> otherwise
>>> lead to a hang.
>>>     
>>
>>     First off, does this still exist in mainline?
>>
>> <snip>
>>  
>>  
>>> The last step of the mount process calls for it to drop the EX on 
>>> the superblock
>>> lock. The node holds onto the same lock level on its journal/slot lock.
>>>
>>> The recovery thread then picks up the EX on the superblock lock and 
>>> then takes
>>> the same lock for the journal/slot it is recovering.
>>>
>>> This exposes a tiny race as another node mounting the volume could 
>>> race the
>>> recovery thread for the EX on the superblock lock.
>>>
>>> If that happens, that node could be assigned a dirty slot which it 
>>> would recover.
>>> It too will then drop the EX on the superblock lock but hold onto 
>>> the same
>>> lock level on its journal/slot lock.
>>>
>>> Now when the recovery thread on the first node gets back the EX on 
>>> the superblock
>>> lock, it will promptly attempt to get the EX on the journal/slot 
>>> lock of the node
>>> it thinks is dirty but since has not only been recovered but also 
>>> locked by
>>> another node.
>>>
>>> Hang.
>>>
>>> The fix here is to make the journal/slot EX lock request during 
>>> recovery a trylock
>>> operation. If the trylock fails, it would indicate that that slot is 
>>> in use and
>>> thus recovered.
>>>     
>>
>>     How do we distinguish this from a node that hasn't been evicted
>> yet?  If you recall, we first tried to eliminate the recovery map
>> completely, having recovery just trylock every journal to find the dead
>> node.  But if the node hasn't been evicted, the trylock fails.  Isn't
>> this the same thing?
>>     Let me try putting it another way.  The recovery thread just
>> sees entries.  It doesn't know whether they came from mount or from a
>> dead node event.  So, we have an entry.  Sure, it could be an entry that
>> was locked during mount, then unlocked, and now we're trying to relock.
>> The only way the trylock fails is if someone else recovered it.  That's
>> safe.  If it came from another path, though, we expect to sleep on that
>> lock - that's how we know the dlm has evicted it, because the lock
>> blocks until then.  How do we determine this is the case and sleep on
>> it?  Otherwise, our trylock fails, we skip it, then the dlm evicts it,
>> and noone has recovered it.  We then continue assuming the journal was
>> replayed, which it was not.
>>     I'm missing something, right?
>>
>> Joel
>>
>>   
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery
  2008-07-08 17:08       ` Sunil Mushran
@ 2008-07-08 18:17         ` Joel Becker
  2008-07-09 17:35           ` Mark Fasheh
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Becker @ 2008-07-08 18:17 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Jul 08, 2008 at 10:08:38AM -0700, Sunil Mushran wrote:
> Instead of using generation, we could use ij_pad. Renamed to
> something appropriate.

	I like that, as it is specific to the journal part and leaves
i_generation alone.  Can we see ourselves needing any other use of that
word?

Joel

> Sunil Mushran wrote:
>> Possible Solution:
>> Use the i_generation in the journal inode as a recovery counter.
>> Meaning, we increment it each time we recover the journal. That
>> is, from mount thread in ocfs2_journal_load() and from recovery
>> thread in ocfs2_replay_journal(). In both places, we have the
>> superblock lock making the writes safe.

-- 

"When ideas fail, words come in very handy." 
         - Goethe

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery
  2008-07-08 18:17         ` Joel Becker
@ 2008-07-09 17:35           ` Mark Fasheh
  2008-07-09 18:29             ` Joel Becker
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Fasheh @ 2008-07-09 17:35 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, Jul 08, 2008 at 11:17:06AM -0700, Joel Becker wrote:
> On Tue, Jul 08, 2008 at 10:08:38AM -0700, Sunil Mushran wrote:
> > Instead of using generation, we could use ij_pad. Renamed to
> > something appropriate.
> 
> 	I like that, as it is specific to the journal part and leaves
> i_generation alone.  Can we see ourselves needing any other use of that
> word?

The only thing that crossed my mind would be journal version (for when we have
JBD2), but I think it's better to use an inode flag for that anyway.
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery
  2008-07-09 17:35           ` Mark Fasheh
@ 2008-07-09 18:29             ` Joel Becker
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Becker @ 2008-07-09 18:29 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Jul 09, 2008 at 10:35:20AM -0700, Mark Fasheh wrote:
> On Tue, Jul 08, 2008 at 11:17:06AM -0700, Joel Becker wrote:
> > On Tue, Jul 08, 2008 at 10:08:38AM -0700, Sunil Mushran wrote:
> > > Instead of using generation, we could use ij_pad. Renamed to
> > > something appropriate.
> > 
> > 	I like that, as it is specific to the journal part and leaves
> > i_generation alone.  Can we see ourselves needing any other use of that
> > word?
> 
> The only thing that crossed my mind would be journal version (for when we have
> JBD2), but I think it's better to use an inode flag for that anyway.

	Don't we have extra flag space on journal1.ij_flags?  That seems
like the right place for OCFS2_JOURNAL_FLAG_JBD2.

Joel

-- 

"I'm drifting and drifting
 Just like a ship out on the sea.
 Cause I ain't got nobody, baby,
 In this world to care for me."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-07-09 18:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-07 20:26 [Ocfs2-devel] Four patches for ocfs2 1.4 Sunil Mushran
2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: Silence an error message in ocfs2_file_aio_read() Sunil Mushran
2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: Change the recovery map to an array of node numbers Sunil Mushran
2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 3/4] ocfs2: Fixes tiny race between mount and recovery Sunil Mushran
2008-07-07 20:50   ` Joel Becker
     [not found]     ` <4872C7A4.1010507@oracle.com>
2008-07-08 17:08       ` Sunil Mushran
2008-07-08 18:17         ` Joel Becker
2008-07-09 17:35           ` Mark Fasheh
2008-07-09 18:29             ` Joel Becker
2008-07-07 20:51   ` Joel Becker
2008-07-07 20:26 ` [Ocfs2-devel] [PATCH 4/4] ocfs2/dlm: Fixes oops in dlm_new_lockres() Sunil Mushran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.