[Ocfs2-devel] [PATCH 0/62] Ocfs2 updates for 2.6.26-rc1

ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed

* [Ocfs2-devel] [PATCH 0/62] Ocfs2 updates for 2.6.26-rc1
@ 2008-04-02 20:14 Mark Fasheh
  2008-04-02 20:14 ` [Ocfs2-devel] [PATCH 01/62] ocfs2: Move slot map access into slot_map.c Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

The following series of patches comprises the bulk of our outstanding
changes for Ocfs2.

Aside from the usual set of cleanups and fixes that were inappropriate for
2.6.25, there are a few highlights:

The '/sys/o2cb' directory has been moved to '/sys/fs/o2cb'. The new location
meshes better with modern sysfs layout. A symbolic link has been placed in
the old location so as to not break old versions of ocfs2-tools. New
versions of ocfs2-tools know to look in /sys/fs/o2cb. When an appropriate
amount of time has passed (decided to be two years), we can remove the link.
This change required a small patch to sysfs (entirely external to Ocfs2)
which is included here with the appropriate 'Acked-by' lines.

Inode allocation in Ocfs2 has been modified to better handle an annoying
corner case. When a node's local inode allocator fills up, it attempts to
grow the allocator by adding an inode group. This might be impossible
though, if the main file system bitmap is too full or fragmented to provide
the required space. This used to be treated as an ENOSPC condition, but with
the addition of Tao's "inode stealing" patches, the allocation code will
attempt to allocate from other node's inode allocators before throwing an
error.

Merging of unwritten extents has also undergone an incremental but
significant improvement - extents can now be merged between leaf nodes. This
ensures that the allocation btree stays as compact as possible, even if
previous write patterns had caused it to fragment. Thanks again goes to Tao
for this improvement.

Sunil has improved our ability to debug the Ocfs2 DLM by allowing us to
track DLM state via a set of debugfs files. It's now possible to get a
point-in-time view of master list entries, lock resource states and more.
Debugfs.ocfs2 has been patched to make this process even easier.

And finally, we have Joel's work to allow Ocfs2 to use userspace cluster
stacks. This series of patches is the last step in a multi-year process of
seperating the Ocfs2 file system code from the underlying cluster stack.
The file system is now cluster stack agnostic. Users can choose between
the "o2cb" stack which is comprised of the traditional Ocfs2 cluster
components (including fs/ocfs2/dlm, also referred to as "o2dlm") or the new
"user" cluster stack. The "user" cluster stack requires a userspace
component to communicate node membership information to the file system via
a misc device. In "user" cluster stack mode, Dave Teigland's dlm (fs/dlm) is
used as it already contains a cluster stack agnostic userspace API.

This all has several benefits. The most obvious is that we now get to share
code and maintenance cost with other cluster-related projects instead of
re-implementing cluster and dlm features in parallel. Additionally, Ocfs2
users can now run the cluster stack of their choice. For example, while we
anticipate that some users will want to stick with o2cb for it's simplicity
of setup and use, many will want access to some of the advanced features
(clustered volume management, hardware fencing, service failover, etc) that
are already provided by most userspace cluster stacks. These patches allow
for that sort of decision to be made.

Of course, all of this is 100% backwards compatible with old versions of
Ocfs2-tools. Users only need to download a new version of Ocfs2-tools if
they want to take advantage of the userspace cluster stack feature. An
ocfs2-tools tree with code to enable userspace cluster stacks can be found
at:

http://oss.oracle.com/git/?p=ocfs2-tools.git;a=shortlog;h=stack-user

Right now, the sole stack interface implemented in the toolchain is to Red
Hat's "cluster" project. In time, we anticipate that ocfs2-tools will grow
support for other cluster stacks, including linux-ha. Our thanks go to the
folks involved in the "cluster" project. Their help and advice was
instrumental to getting this together.

Finally, my apologies for the large e-mail. There's a lot of patches here
and I wanted to make sure any interested parties had a good idea of what
they represent.
	--Mark

Git branch with these changes:

git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-linus

Diffstat and shortlog:

 Documentation/ABI/obsolete/o2cb            |   11 +
 Documentation/ABI/stable/o2cb              |   10 +
 Documentation/ABI/testing/sysfs-ocfs2      |   89 +++
 Documentation/feature-removal-schedule.txt |   10 +
 MAINTAINERS                                |    1 +
 fs/Kconfig                                 |   26 +
 fs/ocfs2/Makefile                          |   14 +-
 fs/ocfs2/alloc.c                           |  465 +++++++++++++--
 fs/ocfs2/aops.c                            |    6 +-
 fs/ocfs2/cluster/sys.c                     |    9 +
 fs/ocfs2/cluster/tcp.c                     |   96 ++--
 fs/ocfs2/cluster/tcp_internal.h            |    2 +
 fs/ocfs2/dlm/dlmcommon.h                   |   49 ++
 fs/ocfs2/dlm/dlmdebug.c                    |  911 +++++++++++++++++++++++++---
 fs/ocfs2/dlm/dlmdebug.h                    |   86 +++
 fs/ocfs2/dlm/dlmdomain.c                   |   70 ++-
 fs/ocfs2/dlm/dlmlock.c                     |   22 +-
 fs/ocfs2/dlm/dlmmaster.c                   |  200 ++-----
 fs/ocfs2/dlmglue.c                         |  645 ++++++++++++--------
 fs/ocfs2/dlmglue.h                         |    5 +-
 fs/ocfs2/file.c                            |    4 +-
 fs/ocfs2/heartbeat.c                       |  184 +------
 fs/ocfs2/heartbeat.h                       |   17 +-
 fs/ocfs2/ioctl.c                           |   13 +-
 fs/ocfs2/ioctl.h                           |    3 +-
 fs/ocfs2/journal.c                         |  211 ++++++-
 fs/ocfs2/journal.h                         |    4 +
 fs/ocfs2/localalloc.c                      |    4 +
 fs/ocfs2/namei.c                           |    4 +-
 fs/ocfs2/ocfs2.h                           |   77 ++-
 fs/ocfs2/ocfs2_fs.h                        |   79 +++-
 fs/ocfs2/ocfs2_lockid.h                    |    2 +-
 fs/ocfs2/slot_map.c                        |  454 +++++++++++---
 fs/ocfs2/slot_map.h                        |   32 +-
 fs/ocfs2/stack_o2cb.c                      |  420 +++++++++++++
 fs/ocfs2/stack_user.c                      |  883 +++++++++++++++++++++++++++
 fs/ocfs2/stackglue.c                       |  568 +++++++++++++++++
 fs/ocfs2/stackglue.h                       |  261 ++++++++
 fs/ocfs2/suballoc.c                        |  103 +++-
 fs/ocfs2/suballoc.h                        |    1 +
 fs/ocfs2/super.c                           |  208 ++++---
 fs/sysfs/symlink.c                         |    9 +-
 42 files changed, 5230 insertions(+), 1038 deletions(-)
 create mode 100644 Documentation/ABI/obsolete/o2cb
 create mode 100644 Documentation/ABI/stable/o2cb
 create mode 100644 Documentation/ABI/testing/sysfs-ocfs2
 create mode 100644 fs/ocfs2/dlm/dlmdebug.h
 create mode 100644 fs/ocfs2/stack_o2cb.c
 create mode 100644 fs/ocfs2/stack_user.c
 create mode 100644 fs/ocfs2/stackglue.c
 create mode 100644 fs/ocfs2/stackglue.h

Andi Kleen (1):
      ocfs2: Convert ocfs2 over to unlocked_ioctl

David Teigland (2):
      ocfs2: handle async EAGAIN from NOQUEUE request
      ocfs2: add fsdlm to stackglue

Jan Kara (1):
      ocfs2: Improve rename locking

Jeff Mahoney (1):
      ocfs2/cluster: Get rid of arguments to the timeout routines

Joel Becker (33):
      ocfs2: Make ocfs2_slot_info private.
      ocfs2: Change the recovery map to an array of node numbers.
      ocfs2: slot_map I/O based on max_slots.
      ocfs2: De-magic the in-memory slot map.
      ocfs2: Define the contents of the slot_map file.
      ocfs2: New slot map format
      ocfs2: Separate out dlm lock functions.
      ocfs2: Use global DLM_ constants in generic code.
      ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API.
      ocfs2: Create the lock status block union.
      ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.
      ocfs2: Abstract out node number queries.
      ocfs2: Move o2hb functionality into the stack glue.
      ocfs2: Remove CANCELGRANT from the view of dlmglue.
      ocfs2: Abstract out a debugging function for underlying dlms.
      ocfs2: Clean up stackglue initialization
      ocfs2: Split o2cb code from generic stack functions.
      ocfs2: Create ocfs2_stack_operations and split out the o2cb stack.
      ocfs2: Break out stackglue into modules.
      ocfs2: Create stack glue sysfs files.
      ocfs2: Add the USERSPACE_STACK incompat bit.
      ocfs2: Add the 'cluster_stack' sysfs file.
      ocfs2: Add the user stack module.
      ocfs2: Add the ocfs2_control misc device.
      ocfs2: Start the ocfs2_control handshake.
      ocfs2: Introduce the DOWN message to ocfs2_control
      ocfs2: Add the local node id to the handshake.
      ocfs2: Add the 'set version' message to the ocfs2_control device.
      ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h
      ocfs2: Add kbuild for ocfs2_stack_user.ko
      ocfs2: Allow selection of cluster plug-ins.
      ocfs2: Document /sys/fs/ocfs2
      ocfs2: Put tree in MAINTAINERS

Julia Lawall (2):
      fs/ocfs2/aops.c: test for IS_ERR rather than 0
      ocfs2: Use BUG_ON

Mark Fasheh (4):
      ocfs2: Move slot map access into slot_map.c
      ocfs2: Fill node number during cluster stack init
      sysfs: Allow removal of symlinks in the sysfs root
      ocfs2: Move /sys/o2cb to /sys/fs/o2cb

Sunil Mushran (12):
      ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle
      ocfs2/dlm: Create slabcaches for lock and lockres
      ocfs2/dlm: Link all lockres' to a tracking list
      ocfs2/dlm: Create debugfs dirs
      ocfs2/dlm: Dump the dlm state in a debugfs file
      ocfs2/dlm: Dumps the lockres' into a debugfs file
      ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h
      ocfs2/dlm: Dumps the mles into a debugfs file
      ocfs2/dlm: Dumps the purgelist into a debugfs file
      ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
      ocfs2/dlm: Fix lockname in lockres print function
      ocfs2/dlm: Cleanup lockres print

Tao Ma (6):
      ocfs2:  Reconnect after idle time out.
      ocfs2: Add support for cross extent block
      ocfs2: Enable cross extent block merge.
      ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
      ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
      ocfs2: Add inode stealing for ocfs2_reserve_new_inode

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 01/62] ocfs2: Move slot map access into slot_map.c
  2008-04-02 20:14 [Ocfs2-devel] [PATCH 0/62] Ocfs2 updates for 2.6.26-rc1 Mark Fasheh
@ 2008-04-02 20:14 ` Mark Fasheh
  2008-04-02 20:14   ` [Ocfs2-devel] [PATCH 02/62] ocfs2: Make ocfs2_slot_info private Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

journal.c and dlmglue.c would refresh the slot map by hand.  Instead, have
the update and clear functions do the work inside slot_map.c.  The eventual
result is to make ocfs2_slot_info defined privately in slot_map.c

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c  |    8 +-----
 fs/ocfs2/journal.c  |    3 +-
 fs/ocfs2/slot_map.c |   62 +++++++++++++++++++++++++++++++++++++++-----------
 fs/ocfs2/slot_map.h |   11 +++-----
 fs/ocfs2/super.c    |    3 +-
 5 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 1f1873b..1a80fa9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2132,8 +2132,6 @@ int ocfs2_super_lock(struct ocfs2_super *osb,
 	int status = 0;
 	int level = ex ? LKM_EXMODE : LKM_PRMODE;
 	struct ocfs2_lock_res *lockres = &osb->osb_super_lockres;
-	struct buffer_head *bh;
-	struct ocfs2_slot_info *si = osb->slot_info;
 
 	mlog_entry_void();
 
@@ -2159,11 +2157,7 @@ int ocfs2_super_lock(struct ocfs2_super *osb,
 		goto bail;
 	}
 	if (status) {
-		bh = si->si_bh;
-		status = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0,
-					  si->si_inode);
-		if (status == 0)
-			ocfs2_update_slot_info(si);
+		status = ocfs2_refresh_slot_info(osb);
 
 		ocfs2_complete_lock_res_refresh(lockres, status);
 
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index f31c7e8..c2e654e 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -1123,8 +1123,7 @@ static int ocfs2_recover_node(struct ocfs2_super *osb,
 
 	/* Likewise, this would be a strange but ultimately not so
 	 * harmful place to get an error... */
-	ocfs2_clear_slot(si, slot_num);
-	status = ocfs2_update_disk_slots(osb, si);
+	status = ocfs2_clear_slot(osb, slot_num);
 	if (status < 0)
 		mlog_errno(status);
 
diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index 3a50ce5..f5727b8 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -49,7 +49,7 @@ static void __ocfs2_fill_slot(struct ocfs2_slot_info *si,
 			      s16 node_num);
 
 /* post the slot information on disk into our slot_info struct. */
-void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
+static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 {
 	int i;
 	__le16 *disk_info;
@@ -65,10 +65,27 @@ void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 	spin_unlock(&si->si_lock);
 }
 
+int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
+{
+	int ret;
+	struct ocfs2_slot_info *si = osb->slot_info;
+	struct buffer_head *bh;
+
+	if (si == NULL)
+		return 0;
+
+	bh = si->si_bh;
+	ret = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0, si->si_inode);
+	if (ret == 0)
+		ocfs2_update_slot_info(si);
+
+	return ret;
+}
+
 /* post the our slot info stuff into it's destination bh and write it
  * out. */
-int ocfs2_update_disk_slots(struct ocfs2_super *osb,
-			    struct ocfs2_slot_info *si)
+static int ocfs2_update_disk_slots(struct ocfs2_super *osb,
+				   struct ocfs2_slot_info *si)
 {
 	int status, i;
 	__le16 *disk_info = (__le16 *) si->si_bh->b_data;
@@ -135,6 +152,19 @@ s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
 	return ret;
 }
 
+static void __ocfs2_free_slot_info(struct ocfs2_slot_info *si)
+{
+	if (si == NULL)
+		return;
+
+	if (si->si_inode)
+		iput(si->si_inode);
+	if (si->si_bh)
+		brelse(si->si_bh);
+
+	kfree(si);
+}
+
 static void __ocfs2_fill_slot(struct ocfs2_slot_info *si,
 			      s16 slot_num,
 			      s16 node_num)
@@ -147,12 +177,18 @@ static void __ocfs2_fill_slot(struct ocfs2_slot_info *si,
 	si->si_global_node_nums[slot_num] = node_num;
 }
 
-void ocfs2_clear_slot(struct ocfs2_slot_info *si,
-		      s16 slot_num)
+int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num)
 {
+	struct ocfs2_slot_info *si = osb->slot_info;
+
+	if (si == NULL)
+		return 0;
+
 	spin_lock(&si->si_lock);
 	__ocfs2_fill_slot(si, slot_num, OCFS2_INVALID_SLOT);
 	spin_unlock(&si->si_lock);
+
+	return ocfs2_update_disk_slots(osb, osb->slot_info);
 }
 
 int ocfs2_init_slot_info(struct ocfs2_super *osb)
@@ -202,18 +238,17 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb)
 	osb->slot_info = si;
 bail:
 	if (status < 0 && si)
-		ocfs2_free_slot_info(si);
+		__ocfs2_free_slot_info(si);
 
 	return status;
 }
 
-void ocfs2_free_slot_info(struct ocfs2_slot_info *si)
+void ocfs2_free_slot_info(struct ocfs2_super *osb)
 {
-	if (si->si_inode)
-		iput(si->si_inode);
-	if (si->si_bh)
-		brelse(si->si_bh);
-	kfree(si);
+	struct ocfs2_slot_info *si = osb->slot_info;
+
+	osb->slot_info = NULL;
+	__ocfs2_free_slot_info(si);
 }
 
 int ocfs2_find_slot(struct ocfs2_super *osb)
@@ -285,7 +320,6 @@ void ocfs2_put_slot(struct ocfs2_super *osb)
 	}
 
 bail:
-	osb->slot_info = NULL;
-	ocfs2_free_slot_info(si);
+	ocfs2_free_slot_info(osb);
 }
 
diff --git a/fs/ocfs2/slot_map.h b/fs/ocfs2/slot_map.h
index 1025872..b029ffd 100644
--- a/fs/ocfs2/slot_map.h
+++ b/fs/ocfs2/slot_map.h
@@ -30,7 +30,7 @@
 struct ocfs2_slot_info {
 	spinlock_t si_lock;
 
-       	struct inode *si_inode;
+	struct inode *si_inode;
 	struct buffer_head *si_bh;
 	unsigned int si_num_slots;
 	unsigned int si_size;
@@ -38,19 +38,16 @@ struct ocfs2_slot_info {
 };
 
 int ocfs2_init_slot_info(struct ocfs2_super *osb);
-void ocfs2_free_slot_info(struct ocfs2_slot_info *si);
+void ocfs2_free_slot_info(struct ocfs2_super *osb);
 
 int ocfs2_find_slot(struct ocfs2_super *osb);
 void ocfs2_put_slot(struct ocfs2_super *osb);
 
-void ocfs2_update_slot_info(struct ocfs2_slot_info *si);
-int ocfs2_update_disk_slots(struct ocfs2_super *osb,
-			    struct ocfs2_slot_info *si);
+int ocfs2_refresh_slot_info(struct ocfs2_super *osb);
 
 s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
 			   s16 global);
-void ocfs2_clear_slot(struct ocfs2_slot_info *si,
-		      s16 slot_num);
+int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num);
 
 static inline int ocfs2_is_empty_slot(struct ocfs2_slot_info *si,
 				      int slot_num)
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index bec75af..fad37af 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1724,8 +1724,7 @@ static void ocfs2_delete_osb(struct ocfs2_super *osb)
 
 	/* This function assumes that the caller has the main osb resource */
 
-	if (osb->slot_info)
-		ocfs2_free_slot_info(osb->slot_info);
+	ocfs2_free_slot_info(osb);
 
 	kfree(osb->osb_orphan_wipes);
 	/* FIXME
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 02/62] ocfs2: Make ocfs2_slot_info private.
  2008-04-02 20:14 ` [Ocfs2-devel] [PATCH 01/62] ocfs2: Move slot map access into slot_map.c Mark Fasheh
@ 2008-04-02 20:14   ` Mark Fasheh
  2008-04-02 20:14     ` [Ocfs2-devel] [PATCH 03/62] ocfs2: Change the recovery map to an array of node numbers Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Just use osb_lock around the ocfs2_slot_info data.  This allows us to
take the ocfs2_slot_info structure private in slot_info.c.  All access
is now via accessors.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/journal.c  |   24 +++++++-------
 fs/ocfs2/ocfs2.h    |    1 +
 fs/ocfs2/slot_map.c |   81 ++++++++++++++++++++++++++++++++++++---------------
 fs/ocfs2/slot_map.h |   25 ++-------------
 4 files changed, 74 insertions(+), 57 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index c2e654e..ed0c6d0 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -1079,7 +1079,6 @@ static int ocfs2_recover_node(struct ocfs2_super *osb,
 {
 	int status = 0;
 	int slot_num;
-	struct ocfs2_slot_info *si = osb->slot_info;
 	struct ocfs2_dinode *la_copy = NULL;
 	struct ocfs2_dinode *tl_copy = NULL;
 
@@ -1092,8 +1091,8 @@ static int ocfs2_recover_node(struct ocfs2_super *osb,
 	 * case we should've called ocfs2_journal_load instead. */
 	BUG_ON(osb->node_num == node_num);
 
-	slot_num = ocfs2_node_num_to_slot(si, node_num);
-	if (slot_num == OCFS2_INVALID_SLOT) {
+	slot_num = ocfs2_node_num_to_slot(osb, node_num);
+	if (slot_num == -ENOENT) {
 		status = 0;
 		mlog(0, "no slot for this node, so no recovery required.\n");
 		goto done;
@@ -1183,23 +1182,24 @@ bail:
  * slot info struct has been updated from disk. */
 int ocfs2_mark_dead_nodes(struct ocfs2_super *osb)
 {
-	int status, i, node_num;
-	struct ocfs2_slot_info *si = osb->slot_info;
+	unsigned int node_num;
+	int status, i;
 
 	/* This is called with the super block cluster lock, so we
 	 * know that the slot map can't change underneath us. */
 
-	spin_lock(&si->si_lock);
-	for(i = 0; i < si->si_num_slots; i++) {
+	spin_lock(&osb->osb_lock);
+	for (i = 0; i < osb->max_slots; i++) {
 		if (i == osb->slot_num)
 			continue;
-		if (ocfs2_is_empty_slot(si, i))
+
+		status = ocfs2_slot_to_node_num_locked(osb, i, &node_num);
+		if (status == -ENOENT)
 			continue;
 
-		node_num = si->si_global_node_nums[i];
 		if (ocfs2_node_map_test_bit(osb, &osb->recovery_map, node_num))
 			continue;
-		spin_unlock(&si->si_lock);
+		spin_unlock(&osb->osb_lock);
 
 		/* Ok, we have a slot occupied by another node which
 		 * is not in the recovery map. We trylock his journal
@@ -1215,9 +1215,9 @@ int ocfs2_mark_dead_nodes(struct ocfs2_super *osb)
 			goto bail;
 		}
 
-		spin_lock(&si->si_lock);
+		spin_lock(&osb->osb_lock);
 	}
-	spin_unlock(&si->si_lock);
+	spin_unlock(&osb->osb_lock);
 
 	status = 0;
 bail:
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6546cef..ee3f675 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -179,6 +179,7 @@ enum ocfs2_mount_options
 #define OCFS2_DEFAULT_ATIME_QUANTUM	60
 
 struct ocfs2_journal;
+struct ocfs2_slot_info;
 struct ocfs2_super
 {
 	struct task_struct *commit_task;
diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index f5727b8..762360d 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -42,13 +42,25 @@
 
 #include "buffer_head_io.h"
 
+struct ocfs2_slot_info {
+	struct inode *si_inode;
+	struct buffer_head *si_bh;
+	unsigned int si_num_slots;
+	unsigned int si_size;
+	s16 si_global_node_nums[OCFS2_MAX_SLOTS];
+};
+
+
 static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
 				    s16 global);
 static void __ocfs2_fill_slot(struct ocfs2_slot_info *si,
 			      s16 slot_num,
 			      s16 node_num);
 
-/* post the slot information on disk into our slot_info struct. */
+/*
+ * Post the slot information on disk into our slot_info struct.
+ * Must be protected by osb_lock.
+ */
 static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 {
 	int i;
@@ -56,13 +68,10 @@ static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 
 	/* we don't read the slot block here as ocfs2_super_lock
 	 * should've made sure we have the most recent copy. */
-	spin_lock(&si->si_lock);
 	disk_info = (__le16 *) si->si_bh->b_data;
 
 	for (i = 0; i < si->si_size; i++)
 		si->si_global_node_nums[i] = le16_to_cpu(disk_info[i]);
-
-	spin_unlock(&si->si_lock);
 }
 
 int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
@@ -76,8 +85,11 @@ int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
 
 	bh = si->si_bh;
 	ret = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0, si->si_inode);
-	if (ret == 0)
+	if (ret == 0) {
+		spin_lock(&osb->osb_lock);
 		ocfs2_update_slot_info(si);
+		spin_unlock(&osb->osb_lock);
+	}
 
 	return ret;
 }
@@ -90,10 +102,10 @@ static int ocfs2_update_disk_slots(struct ocfs2_super *osb,
 	int status, i;
 	__le16 *disk_info = (__le16 *) si->si_bh->b_data;
 
-	spin_lock(&si->si_lock);
+	spin_lock(&osb->osb_lock);
 	for (i = 0; i < si->si_size; i++)
 		disk_info[i] = cpu_to_le16(si->si_global_node_nums[i]);
-	spin_unlock(&si->si_lock);
+	spin_unlock(&osb->osb_lock);
 
 	status = ocfs2_write_block(osb, si->si_bh, si->si_inode);
 	if (status < 0)
@@ -119,7 +131,8 @@ static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
 	return ret;
 }
 
-static s16 __ocfs2_find_empty_slot(struct ocfs2_slot_info *si, s16 preferred)
+static s16 __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
+				   s16 preferred)
 {
 	int i;
 	s16 ret = OCFS2_INVALID_SLOT;
@@ -141,15 +154,36 @@ out:
 	return ret;
 }
 
-s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
-			   s16 global)
+int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num)
 {
-	s16 ret;
+	s16 slot;
+	struct ocfs2_slot_info *si = osb->slot_info;
 
-	spin_lock(&si->si_lock);
-	ret = __ocfs2_node_num_to_slot(si, global);
-	spin_unlock(&si->si_lock);
-	return ret;
+	spin_lock(&osb->osb_lock);
+	slot = __ocfs2_node_num_to_slot(si, node_num);
+	spin_unlock(&osb->osb_lock);
+
+	if (slot == OCFS2_INVALID_SLOT)
+		return -ENOENT;
+
+	return slot;
+}
+
+int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num,
+				  unsigned int *node_num)
+{
+	struct ocfs2_slot_info *si = osb->slot_info;
+
+	assert_spin_locked(&osb->osb_lock);
+
+	BUG_ON(slot_num < 0);
+	BUG_ON(slot_num > osb->max_slots);
+
+	if (si->si_global_node_nums[slot_num] == OCFS2_INVALID_SLOT)
+		return -ENOENT;
+
+	*node_num = si->si_global_node_nums[slot_num];
+	return 0;
 }
 
 static void __ocfs2_free_slot_info(struct ocfs2_slot_info *si)
@@ -184,9 +218,9 @@ int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num)
 	if (si == NULL)
 		return 0;
 
-	spin_lock(&si->si_lock);
+	spin_lock(&osb->osb_lock);
 	__ocfs2_fill_slot(si, slot_num, OCFS2_INVALID_SLOT);
-	spin_unlock(&si->si_lock);
+	spin_unlock(&osb->osb_lock);
 
 	return ocfs2_update_disk_slots(osb, osb->slot_info);
 }
@@ -206,7 +240,6 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb)
 		goto bail;
 	}
 
-	spin_lock_init(&si->si_lock);
 	si->si_num_slots = osb->max_slots;
 	si->si_size = OCFS2_MAX_SLOTS;
 
@@ -235,7 +268,7 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb)
 
 	si->si_inode = inode;
 	si->si_bh = bh;
-	osb->slot_info = si;
+	osb->slot_info = (struct ocfs2_slot_info *)si;
 bail:
 	if (status < 0 && si)
 		__ocfs2_free_slot_info(si);
@@ -261,9 +294,9 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 
 	si = osb->slot_info;
 
+	spin_lock(&osb->osb_lock);
 	ocfs2_update_slot_info(si);
 
-	spin_lock(&si->si_lock);
 	/* search for ourselves first and take the slot if it already
 	 * exists. Perhaps we need to mark this in a variable for our
 	 * own journal recovery? Possibly not, though we certainly
@@ -274,7 +307,7 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 		 * one. */
 		slot = __ocfs2_find_empty_slot(si, osb->preferred_slot);
 		if (slot == OCFS2_INVALID_SLOT) {
-			spin_unlock(&si->si_lock);
+			spin_unlock(&osb->osb_lock);
 			mlog(ML_ERROR, "no free slots available!\n");
 			status = -EINVAL;
 			goto bail;
@@ -285,7 +318,7 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 
 	__ocfs2_fill_slot(si, slot, osb->node_num);
 	osb->slot_num = slot;
-	spin_unlock(&si->si_lock);
+	spin_unlock(&osb->osb_lock);
 
 	mlog(0, "taking node slot %d\n", osb->slot_num);
 
@@ -306,12 +339,12 @@ void ocfs2_put_slot(struct ocfs2_super *osb)
 	if (!si)
 		return;
 
+	spin_lock(&osb->osb_lock);
 	ocfs2_update_slot_info(si);
 
-	spin_lock(&si->si_lock);
 	__ocfs2_fill_slot(si, osb->slot_num, OCFS2_INVALID_SLOT);
 	osb->slot_num = OCFS2_INVALID_SLOT;
-	spin_unlock(&si->si_lock);
+	spin_unlock(&osb->osb_lock);
 
 	status = ocfs2_update_disk_slots(osb, si);
 	if (status < 0) {
diff --git a/fs/ocfs2/slot_map.h b/fs/ocfs2/slot_map.h
index b029ffd..5118e89 100644
--- a/fs/ocfs2/slot_map.h
+++ b/fs/ocfs2/slot_map.h
@@ -27,16 +27,6 @@
 #ifndef SLOTMAP_H
 #define SLOTMAP_H
 
-struct ocfs2_slot_info {
-	spinlock_t si_lock;
-
-	struct inode *si_inode;
-	struct buffer_head *si_bh;
-	unsigned int si_num_slots;
-	unsigned int si_size;
-	s16 si_global_node_nums[OCFS2_MAX_SLOTS];
-};
-
 int ocfs2_init_slot_info(struct ocfs2_super *osb);
 void ocfs2_free_slot_info(struct ocfs2_super *osb);
 
@@ -45,17 +35,10 @@ void ocfs2_put_slot(struct ocfs2_super *osb);
 
 int ocfs2_refresh_slot_info(struct ocfs2_super *osb);
 
-s16 ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
-			   s16 global);
-int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num);
+int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num);
+int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num,
+				  unsigned int *node_num);
 
-static inline int ocfs2_is_empty_slot(struct ocfs2_slot_info *si,
-				      int slot_num)
-{
-	BUG_ON(slot_num == OCFS2_INVALID_SLOT);
-	assert_spin_locked(&si->si_lock);
-
-	return si->si_global_node_nums[slot_num] == OCFS2_INVALID_SLOT;
-}
+int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num);
 
 #endif
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 03/62] ocfs2: Change the recovery map to an array of node numbers.
  2008-04-02 20:14   ` [Ocfs2-devel] [PATCH 02/62] ocfs2: Make ocfs2_slot_info private Mark Fasheh
@ 2008-04-02 20:14     ` Mark Fasheh
  2008-04-02 20:14       ` [Ocfs2-devel] [PATCH 04/62] ocfs2: slot_map I/O based on max_slots Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The old recovery map was a bitmap of node numbers.  This was sufficient
for the maximum node number of 254.  Going forward, we want node numbers
to be UINT32.  Thus, we need a new recovery map.

Note that we can't keep track of slots here.  We must write down the
node number to recovery *before* we get the locks needed to convert a
node number into a slot number.

The recovery map is now an array of unsigned ints, max_slots in size.
It moves to journal.c with the rest of recovery.

Because it needs to be initialized, we move all of recovery initialization
into a new function, ocfs2_recovery_init().  This actually cleans up
ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
becomes part of ocfs2_recovery_exit().

A number of node map functions are rendered obsolete and are removed.

Finally, waiting on recovery is wrapped in a function rather than naked
checks on the recovery_event.  This is a cleanup from Mark.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |    6 +-
 fs/ocfs2/heartbeat.c |  111 ------------------------------
 fs/ocfs2/heartbeat.h |   14 ----
 fs/ocfs2/journal.c   |  181 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/ocfs2/journal.h   |    4 +
 fs/ocfs2/ocfs2.h     |    3 +-
 fs/ocfs2/super.c     |   33 ++-------
 7 files changed, 182 insertions(+), 170 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 1a80fa9..15a5167 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1950,8 +1950,7 @@ int ocfs2_inode_lock_full(struct inode *inode,
 		goto local;
 
 	if (!(arg_flags & OCFS2_META_LOCK_RECOVERY))
-		wait_event(osb->recovery_event,
-			   ocfs2_node_map_is_empty(osb, &osb->recovery_map));
+		ocfs2_wait_for_recovery(osb);
 
 	lockres = &OCFS2_I(inode)->ip_inode_lockres;
 	level = ex ? LKM_EXMODE : LKM_PRMODE;
@@ -1974,8 +1973,7 @@ int ocfs2_inode_lock_full(struct inode *inode,
 	 * committed to owning this lock so we don't allow signals to
 	 * abort the operation. */
 	if (!(arg_flags & OCFS2_META_LOCK_RECOVERY))
-		wait_event(osb->recovery_event,
-			   ocfs2_node_map_is_empty(osb, &osb->recovery_map));
+		ocfs2_wait_for_recovery(osb);
 
 local:
 	/*
diff --git a/fs/ocfs2/heartbeat.c b/fs/ocfs2/heartbeat.c
index 0758daf..80de239 100644
--- a/fs/ocfs2/heartbeat.c
+++ b/fs/ocfs2/heartbeat.c
@@ -48,7 +48,6 @@ static inline void __ocfs2_node_map_set_bit(struct ocfs2_node_map *map,
 					    int bit);
 static inline void __ocfs2_node_map_clear_bit(struct ocfs2_node_map *map,
 					      int bit);
-static inline int __ocfs2_node_map_is_empty(struct ocfs2_node_map *map);
 
 /* special case -1 for now
  * TODO: should *really* make sure the calling func never passes -1!!  */
@@ -62,7 +61,6 @@ static void ocfs2_node_map_init(struct ocfs2_node_map *map)
 void ocfs2_init_node_maps(struct ocfs2_super *osb)
 {
 	spin_lock_init(&osb->node_map_lock);
-	ocfs2_node_map_init(&osb->recovery_map);
 	ocfs2_node_map_init(&osb->osb_recovering_orphan_dirs);
 }
 
@@ -192,112 +190,3 @@ int ocfs2_node_map_test_bit(struct ocfs2_super *osb,
 	return ret;
 }
 
-static inline int __ocfs2_node_map_is_empty(struct ocfs2_node_map *map)
-{
-	int bit;
-	bit = find_next_bit(map->map, map->num_nodes, 0);
-	if (bit < map->num_nodes)
-		return 0;
-	return 1;
-}
-
-int ocfs2_node_map_is_empty(struct ocfs2_super *osb,
-			    struct ocfs2_node_map *map)
-{
-	int ret;
-	BUG_ON(map->num_nodes == 0);
-	spin_lock(&osb->node_map_lock);
-	ret = __ocfs2_node_map_is_empty(map);
-	spin_unlock(&osb->node_map_lock);
-	return ret;
-}
-
-#if 0
-
-static void __ocfs2_node_map_dup(struct ocfs2_node_map *target,
-				 struct ocfs2_node_map *from)
-{
-	BUG_ON(from->num_nodes == 0);
-	ocfs2_node_map_init(target);
-	__ocfs2_node_map_set(target, from);
-}
-
-/* returns 1 if bit is the only bit set in target, 0 otherwise */
-int ocfs2_node_map_is_only(struct ocfs2_super *osb,
-			   struct ocfs2_node_map *target,
-			   int bit)
-{
-	struct ocfs2_node_map temp;
-	int ret;
-
-	spin_lock(&osb->node_map_lock);
-	__ocfs2_node_map_dup(&temp, target);
-	__ocfs2_node_map_clear_bit(&temp, bit);
-	ret = __ocfs2_node_map_is_empty(&temp);
-	spin_unlock(&osb->node_map_lock);
-
-	return ret;
-}
-
-static void __ocfs2_node_map_set(struct ocfs2_node_map *target,
-				 struct ocfs2_node_map *from)
-{
-	int num_longs, i;
-
-	BUG_ON(target->num_nodes != from->num_nodes);
-	BUG_ON(target->num_nodes == 0);
-
-	num_longs = BITS_TO_LONGS(target->num_nodes);
-	for (i = 0; i < num_longs; i++)
-		target->map[i] = from->map[i];
-}
-
-#endif  /*  0  */
-
-/* Returns whether the recovery bit was actually set - it may not be
- * if a node is still marked as needing recovery */
-int ocfs2_recovery_map_set(struct ocfs2_super *osb,
-			   int num)
-{
-	int set = 0;
-
-	spin_lock(&osb->node_map_lock);
-
-	if (!test_bit(num, osb->recovery_map.map)) {
-	    __ocfs2_node_map_set_bit(&osb->recovery_map, num);
-	    set = 1;
-	}
-
-	spin_unlock(&osb->node_map_lock);
-
-	return set;
-}
-
-void ocfs2_recovery_map_clear(struct ocfs2_super *osb,
-			      int num)
-{
-	ocfs2_node_map_clear_bit(osb, &osb->recovery_map, num);
-}
-
-int ocfs2_node_map_iterate(struct ocfs2_super *osb,
-			   struct ocfs2_node_map *map,
-			   int idx)
-{
-	int i = idx;
-
-	idx = O2NM_INVALID_NODE_NUM;
-	spin_lock(&osb->node_map_lock);
-	if ((i != O2NM_INVALID_NODE_NUM) &&
-	    (i >= 0) &&
-	    (i < map->num_nodes)) {
-		while(i < map->num_nodes) {
-			if (test_bit(i, map->map)) {
-				idx = i;
-				break;
-			}
-			i++;
-		}
-	}
-	spin_unlock(&osb->node_map_lock);
-	return idx;
-}
diff --git a/fs/ocfs2/heartbeat.h b/fs/ocfs2/heartbeat.h
index eac63ae..98d8ffc 100644
--- a/fs/ocfs2/heartbeat.h
+++ b/fs/ocfs2/heartbeat.h
@@ -33,8 +33,6 @@ void ocfs2_stop_heartbeat(struct ocfs2_super *osb);
 
 /* node map functions - used to keep track of mounted and in-recovery
  * nodes. */
-int ocfs2_node_map_is_empty(struct ocfs2_super *osb,
-			    struct ocfs2_node_map *map);
 void ocfs2_node_map_set_bit(struct ocfs2_super *osb,
 			    struct ocfs2_node_map *map,
 			    int bit);
@@ -44,17 +42,5 @@ void ocfs2_node_map_clear_bit(struct ocfs2_super *osb,
 int ocfs2_node_map_test_bit(struct ocfs2_super *osb,
 			    struct ocfs2_node_map *map,
 			    int bit);
-int ocfs2_node_map_iterate(struct ocfs2_super *osb,
-			   struct ocfs2_node_map *map,
-			   int idx);
-static inline int ocfs2_node_map_first_set_bit(struct ocfs2_super *osb,
-					       struct ocfs2_node_map *map)
-{
-	return ocfs2_node_map_iterate(osb, map, 0);
-}
-int ocfs2_recovery_map_set(struct ocfs2_super *osb,
-			   int num);
-void ocfs2_recovery_map_clear(struct ocfs2_super *osb,
-			      int num);
 
 #endif /* OCFS2_HEARTBEAT_H */
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index ed0c6d0..ca4c0ea 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -64,6 +64,137 @@ static int ocfs2_recover_orphans(struct ocfs2_super *osb,
 				 int slot);
 static int ocfs2_commit_thread(void *arg);
 
+
+/*
+ * The recovery_list is a simple linked list of node numbers to recover.
+ * It is protected by the recovery_lock.
+ */
+
+struct ocfs2_recovery_map {
+	int rm_used;
+	unsigned int *rm_entries;
+};
+
+int ocfs2_recovery_init(struct ocfs2_super *osb)
+{
+	struct ocfs2_recovery_map *rm;
+
+	mutex_init(&osb->recovery_lock);
+	osb->disable_recovery = 0;
+	osb->recovery_thread_task = NULL;
+	init_waitqueue_head(&osb->recovery_event);
+
+	rm = kzalloc(sizeof(struct ocfs2_recovery_map) +
+		     osb->max_slots * sizeof(unsigned int),
+		     GFP_KERNEL);
+	if (!rm) {
+		mlog_errno(-ENOMEM);
+		return -ENOMEM;
+	}
+
+	rm->rm_entries = (unsigned int *)((char *)rm +
+					  sizeof(struct ocfs2_recovery_map));
+	osb->recovery_map = rm;
+
+	return 0;
+}
+
+/* we can't grab the goofy sem lock from inside wait_event, so we use
+ * memory barriers to make sure that we'll see the null task before
+ * being woken up */
+static int ocfs2_recovery_thread_running(struct ocfs2_super *osb)
+{
+	mb();
+	return osb->recovery_thread_task != NULL;
+}
+
+void ocfs2_recovery_exit(struct ocfs2_super *osb)
+{
+	struct ocfs2_recovery_map *rm;
+
+	/* disable any new recovery threads and wait for any currently
+	 * running ones to exit. Do this before setting the vol_state. */
+	mutex_lock(&osb->recovery_lock);
+	osb->disable_recovery = 1;
+	mutex_unlock(&osb->recovery_lock);
+	wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb));
+
+	/* At this point, we know that no more recovery threads can be
+	 * launched, so wait for any recovery completion work to
+	 * complete. */
+	flush_workqueue(ocfs2_wq);
+
+	/*
+	 * Now that recovery is shut down, and the osb is about to be
+	 * freed,  the osb_lock is not taken here.
+	 */
+	rm = osb->recovery_map;
+	/* XXX: Should we bug if there are dirty entries? */
+
+	kfree(rm);
+}
+
+static int __ocfs2_recovery_map_test(struct ocfs2_super *osb,
+				     unsigned int node_num)
+{
+	int i;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	assert_spin_locked(&osb->osb_lock);
+
+	for (i = 0; i < rm->rm_used; i++) {
+		if (rm->rm_entries[i] == node_num)
+			return 1;
+	}
+
+	return 0;
+}
+
+/* Behaves like test-and-set.  Returns the previous value */
+static int ocfs2_recovery_map_set(struct ocfs2_super *osb,
+				  unsigned int node_num)
+{
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	spin_lock(&osb->osb_lock);
+	if (__ocfs2_recovery_map_test(osb, node_num)) {
+		spin_unlock(&osb->osb_lock);
+		return 1;
+	}
+
+	/* XXX: Can this be exploited? Not from o2dlm... */
+	BUG_ON(rm->rm_used >= osb->max_slots);
+
+	rm->rm_entries[rm->rm_used] = node_num;
+	rm->rm_used++;
+	spin_unlock(&osb->osb_lock);
+
+	return 0;
+}
+
+static void ocfs2_recovery_map_clear(struct ocfs2_super *osb,
+				     unsigned int node_num)
+{
+	int i;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	spin_lock(&osb->osb_lock);
+
+	for (i = 0; i < rm->rm_used; i++) {
+		if (rm->rm_entries[i] == node_num)
+			break;
+	}
+
+	if (i < rm->rm_used) {
+		/* XXX: be careful with the pointer math */
+		memmove(&(rm->rm_entries[i]), &(rm->rm_entries[i + 1]),
+			(rm->rm_used - i - 1) * sizeof(unsigned int));
+		rm->rm_used--;
+	}
+
+	spin_unlock(&osb->osb_lock);
+}
+
 static int ocfs2_commit_cache(struct ocfs2_super *osb)
 {
 	int status = 0;
@@ -650,6 +781,23 @@ bail:
 	return status;
 }
 
+static int ocfs2_recovery_completed(struct ocfs2_super *osb)
+{
+	int empty;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
+
+	spin_lock(&osb->osb_lock);
+	empty = (rm->rm_used == 0);
+	spin_unlock(&osb->osb_lock);
+
+	return empty;
+}
+
+void ocfs2_wait_for_recovery(struct ocfs2_super *osb)
+{
+	wait_event(osb->recovery_event, ocfs2_recovery_completed(osb));
+}
+
 /*
  * JBD Might read a cached version of another nodes journal file. We
  * don't want this as this file changes often and we get no
@@ -848,6 +996,7 @@ static int __ocfs2_recovery_thread(void *arg)
 {
 	int status, node_num;
 	struct ocfs2_super *osb = arg;
+	struct ocfs2_recovery_map *rm = osb->recovery_map;
 
 	mlog_entry_void();
 
@@ -863,26 +1012,29 @@ restart:
 		goto bail;
 	}
 
-	while(!ocfs2_node_map_is_empty(osb, &osb->recovery_map)) {
-		node_num = ocfs2_node_map_first_set_bit(osb,
-							&osb->recovery_map);
-		if (node_num == O2NM_INVALID_NODE_NUM) {
-			mlog(0, "Out of nodes to recover.\n");
-			break;
-		}
+	spin_lock(&osb->osb_lock);
+	while (rm->rm_used) {
+		/* It's always safe to remove entry zero, as we won't
+		 * clear it until ocfs2_recover_node() has succeeded. */
+		node_num = rm->rm_entries[0];
+		spin_unlock(&osb->osb_lock);
 
 		status = ocfs2_recover_node(osb, node_num);
-		if (status < 0) {
+		if (!status) {
+			ocfs2_recovery_map_clear(osb, node_num);
+		} else {
 			mlog(ML_ERROR,
 			     "Error %d recovering node %d on device (%u,%u)!\n",
 			     status, node_num,
 			     MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev));
 			mlog(ML_ERROR, "Volume requires unmount.\n");
-			continue;
 		}
 
-		ocfs2_recovery_map_clear(osb, node_num);
+		spin_lock(&osb->osb_lock);
 	}
+	spin_unlock(&osb->osb_lock);
+	mlog(0, "All nodes recovered\n");
+
 	ocfs2_super_unlock(osb, 1);
 
 	/* We always run recovery on our own orphan dir - the dead
@@ -893,8 +1045,7 @@ restart:
 
 bail:
 	mutex_lock(&osb->recovery_lock);
-	if (!status &&
-	    !ocfs2_node_map_is_empty(osb, &osb->recovery_map)) {
+	if (!status && !ocfs2_recovery_completed(osb)) {
 		mutex_unlock(&osb->recovery_lock);
 		goto restart;
 	}
@@ -924,8 +1075,8 @@ void ocfs2_recovery_thread(struct ocfs2_super *osb, int node_num)
 
 	/* People waiting on recovery will wait on
 	 * the recovery map to empty. */
-	if (!ocfs2_recovery_map_set(osb, node_num))
-		mlog(0, "node %d already be in recovery.\n", node_num);
+	if (ocfs2_recovery_map_set(osb, node_num))
+		mlog(0, "node %d already in recovery map.\n", node_num);
 
 	mlog(0, "starting recovery thread...\n");
 
@@ -1197,7 +1348,7 @@ int ocfs2_mark_dead_nodes(struct ocfs2_super *osb)
 		if (status == -ENOENT)
 			continue;
 
-		if (ocfs2_node_map_test_bit(osb, &osb->recovery_map, node_num))
+		if (__ocfs2_recovery_map_test(osb, node_num))
 			continue;
 		spin_unlock(&osb->osb_lock);
 
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 220f3e8..db82be2 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -134,6 +134,10 @@ static inline void ocfs2_inode_set_new(struct ocfs2_super *osb,
 
 /* Exported only for the journal struct init code in super.c. Do not call. */
 void ocfs2_complete_recovery(struct work_struct *work);
+void ocfs2_wait_for_recovery(struct ocfs2_super *osb);
+
+int ocfs2_recovery_init(struct ocfs2_super *osb);
+void ocfs2_recovery_exit(struct ocfs2_super *osb);
 
 /*
  *  Journal Control:
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index ee3f675..c6ed8c3 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -180,6 +180,7 @@ enum ocfs2_mount_options
 
 struct ocfs2_journal;
 struct ocfs2_slot_info;
+struct ocfs2_recovery_map;
 struct ocfs2_super
 {
 	struct task_struct *commit_task;
@@ -191,7 +192,6 @@ struct ocfs2_super
 	struct ocfs2_slot_info *slot_info;
 
 	spinlock_t node_map_lock;
-	struct ocfs2_node_map recovery_map;
 
 	u64 root_blkno;
 	u64 system_dir_blkno;
@@ -226,6 +226,7 @@ struct ocfs2_super
 
 	atomic_t vol_state;
 	struct mutex recovery_lock;
+	struct ocfs2_recovery_map *recovery_map;
 	struct task_struct *recovery_thread_task;
 	int disable_recovery;
 	wait_queue_head_t checkpoint_event;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index fad37af..1a4c7c7 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1224,15 +1224,6 @@ leave:
 	return status;
 }
 
-/* we can't grab the goofy sem lock from inside wait_event, so we use
- * memory barriers to make sure that we'll see the null task before
- * being woken up */
-static int ocfs2_recovery_thread_running(struct ocfs2_super *osb)
-{
-	mb();
-	return osb->recovery_thread_task != NULL;
-}
-
 static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 {
 	int tmp;
@@ -1249,17 +1240,8 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 
 	ocfs2_truncate_log_shutdown(osb);
 
-	/* disable any new recovery threads and wait for any currently
-	 * running ones to exit. Do this before setting the vol_state. */
-	mutex_lock(&osb->recovery_lock);
-	osb->disable_recovery = 1;
-	mutex_unlock(&osb->recovery_lock);
-	wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb));
-
-	/* At this point, we know that no more recovery threads can be
-	 * launched, so wait for any recovery completion work to
-	 * complete. */
-	flush_workqueue(ocfs2_wq);
+	/* This will disable recovery and flush any recovery work. */
+	ocfs2_recovery_exit(osb);
 
 	ocfs2_journal_shutdown(osb);
 
@@ -1368,7 +1350,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	osb->s_sectsize_bits = blksize_bits(sector_size);
 	BUG_ON(!osb->s_sectsize_bits);
 
-	init_waitqueue_head(&osb->recovery_event);
 	spin_lock_init(&osb->dc_task_lock);
 	init_waitqueue_head(&osb->dc_event);
 	osb->dc_work_sequence = 0;
@@ -1388,10 +1369,12 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	snprintf(osb->dev_str, sizeof(osb->dev_str), "%u,%u",
 		 MAJOR(osb->sb->s_dev), MINOR(osb->sb->s_dev));
 
-	mutex_init(&osb->recovery_lock);
-
-	osb->disable_recovery = 0;
-	osb->recovery_thread_task = NULL;
+	status = ocfs2_recovery_init(osb);
+	if (status) {
+		mlog(ML_ERROR, "Unable to initialize recovery state\n");
+		mlog_errno(status);
+		goto bail;
+	}
 
 	init_waitqueue_head(&osb->checkpoint_event);
 	atomic_set(&osb->needs_checkpoint, 0);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 04/62] ocfs2: slot_map I/O based on max_slots.
  2008-04-02 20:14     ` [Ocfs2-devel] [PATCH 03/62] ocfs2: Change the recovery map to an array of node numbers Mark Fasheh
@ 2008-04-02 20:14       ` Mark Fasheh
  2008-04-02 20:14         ` [Ocfs2-devel] [PATCH 05/62] ocfs2: De-magic the in-memory slot map Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The slot map code assumed a slot_map file has one block allocated.
This changes the code to I/O as many blocks as will cover max_slots.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/slot_map.c |  128 +++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 108 insertions(+), 20 deletions(-)

diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index 762360d..5bddee1 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -44,7 +44,8 @@
 
 struct ocfs2_slot_info {
 	struct inode *si_inode;
-	struct buffer_head *si_bh;
+	unsigned int si_blocks;
+	struct buffer_head **si_bh;
 	unsigned int si_num_slots;
 	unsigned int si_size;
 	s16 si_global_node_nums[OCFS2_MAX_SLOTS];
@@ -68,7 +69,7 @@ static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 
 	/* we don't read the slot block here as ocfs2_super_lock
 	 * should've made sure we have the most recent copy. */
-	disk_info = (__le16 *) si->si_bh->b_data;
+	disk_info = (__le16 *) si->si_bh[0]->b_data;
 
 	for (i = 0; i < si->si_size; i++)
 		si->si_global_node_nums[i] = le16_to_cpu(disk_info[i]);
@@ -78,13 +79,23 @@ int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
 {
 	int ret;
 	struct ocfs2_slot_info *si = osb->slot_info;
-	struct buffer_head *bh;
 
 	if (si == NULL)
 		return 0;
 
-	bh = si->si_bh;
-	ret = ocfs2_read_block(osb, bh->b_blocknr, &bh, 0, si->si_inode);
+	BUG_ON(si->si_blocks == 0);
+	BUG_ON(si->si_bh == NULL);
+
+	mlog(0, "Refreshing slot map, reading %u block(s)\n",
+	     si->si_blocks);
+
+	/*
+	 * We pass -1 as blocknr because we expect all of si->si_bh to
+	 * be !NULL.  Thus, ocfs2_read_blocks() will ignore blocknr.  If
+	 * this is not true, the read of -1 (UINT64_MAX) will fail.
+	 */
+	ret = ocfs2_read_blocks(osb, -1, si->si_blocks, si->si_bh, 0,
+				si->si_inode);
 	if (ret == 0) {
 		spin_lock(&osb->osb_lock);
 		ocfs2_update_slot_info(si);
@@ -100,20 +111,42 @@ static int ocfs2_update_disk_slots(struct ocfs2_super *osb,
 				   struct ocfs2_slot_info *si)
 {
 	int status, i;
-	__le16 *disk_info = (__le16 *) si->si_bh->b_data;
+	__le16 *disk_info = (__le16 *) si->si_bh[0]->b_data;
 
 	spin_lock(&osb->osb_lock);
 	for (i = 0; i < si->si_size; i++)
 		disk_info[i] = cpu_to_le16(si->si_global_node_nums[i]);
 	spin_unlock(&osb->osb_lock);
 
-	status = ocfs2_write_block(osb, si->si_bh, si->si_inode);
+	status = ocfs2_write_block(osb, si->si_bh[0], si->si_inode);
 	if (status < 0)
 		mlog_errno(status);
 
 	return status;
 }
 
+/*
+ * Calculate how many bytes are needed by the slot map.  Returns
+ * an error if the slot map file is too small.
+ */
+static int ocfs2_slot_map_physical_size(struct ocfs2_super *osb,
+					struct inode *inode,
+					unsigned long long *bytes)
+{
+	unsigned long long bytes_needed;
+
+	bytes_needed = osb->max_slots * sizeof(__le16);
+	if (bytes_needed > i_size_read(inode)) {
+		mlog(ML_ERROR,
+		     "Slot map file is too small!  (size %llu, needed %llu)\n",
+		     i_size_read(inode), bytes_needed);
+		return -ENOSPC;
+	}
+
+	*bytes = bytes_needed;
+	return 0;
+}
+
 /* try to find global node in the slot info. Returns
  * OCFS2_INVALID_SLOT if nothing is found. */
 static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
@@ -188,13 +221,22 @@ int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num,
 
 static void __ocfs2_free_slot_info(struct ocfs2_slot_info *si)
 {
+	unsigned int i;
+
 	if (si == NULL)
 		return;
 
 	if (si->si_inode)
 		iput(si->si_inode);
-	if (si->si_bh)
-		brelse(si->si_bh);
+	if (si->si_bh) {
+		for (i = 0; i < si->si_blocks; i++) {
+			if (si->si_bh[i]) {
+				brelse(si->si_bh[i]);
+				si->si_bh[i] = NULL;
+			}
+		}
+		kfree(si->si_bh);
+	}
 
 	kfree(si);
 }
@@ -225,12 +267,65 @@ int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num)
 	return ocfs2_update_disk_slots(osb, osb->slot_info);
 }
 
+static int ocfs2_map_slot_buffers(struct ocfs2_super *osb,
+				  struct ocfs2_slot_info *si)
+{
+	int status = 0;
+	u64 blkno;
+	unsigned long long blocks, bytes;
+	unsigned int i;
+	struct buffer_head *bh;
+
+	status = ocfs2_slot_map_physical_size(osb, si->si_inode, &bytes);
+	if (status)
+		goto bail;
+
+	blocks = ocfs2_blocks_for_bytes(si->si_inode->i_sb, bytes);
+	BUG_ON(blocks > UINT_MAX);
+	si->si_blocks = blocks;
+	if (!si->si_blocks)
+		goto bail;
+
+	mlog(0, "Slot map needs %u buffers for %llu bytes\n",
+	     si->si_blocks, bytes);
+
+	si->si_bh = kzalloc(sizeof(struct buffer_head *) * si->si_blocks,
+			    GFP_KERNEL);
+	if (!si->si_bh) {
+		status = -ENOMEM;
+		mlog_errno(status);
+		goto bail;
+	}
+
+	for (i = 0; i < si->si_blocks; i++) {
+		status = ocfs2_extent_map_get_blocks(si->si_inode, i,
+						     &blkno, NULL, NULL);
+		if (status < 0) {
+			mlog_errno(status);
+			goto bail;
+		}
+
+		mlog(0, "Reading slot map block %u@%llu\n", i,
+		     (unsigned long long)blkno);
+
+		bh = NULL;  /* Acquire a fresh bh */
+		status = ocfs2_read_block(osb, blkno, &bh, 0, si->si_inode);
+		if (status < 0) {
+			mlog_errno(status);
+			goto bail;
+		}
+
+		si->si_bh[i] = bh;
+	}
+
+bail:
+	return status;
+}
+
 int ocfs2_init_slot_info(struct ocfs2_super *osb)
 {
 	int status, i;
-	u64 blkno;
 	struct inode *inode = NULL;
-	struct buffer_head *bh = NULL;
 	struct ocfs2_slot_info *si;
 
 	si = kzalloc(sizeof(struct ocfs2_slot_info), GFP_KERNEL);
@@ -254,20 +349,13 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb)
 		goto bail;
 	}
 
-	status = ocfs2_extent_map_get_blocks(inode, 0ULL, &blkno, NULL, NULL);
-	if (status < 0) {
-		mlog_errno(status);
-		goto bail;
-	}
-
-	status = ocfs2_read_block(osb, blkno, &bh, 0, inode);
+	si->si_inode = inode;
+	status = ocfs2_map_slot_buffers(osb, si);
 	if (status < 0) {
 		mlog_errno(status);
 		goto bail;
 	}
 
-	si->si_inode = inode;
-	si->si_bh = bh;
 	osb->slot_info = (struct ocfs2_slot_info *)si;
 bail:
 	if (status < 0 && si)
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 05/62] ocfs2: De-magic the in-memory slot map.
  2008-04-02 20:14       ` [Ocfs2-devel] [PATCH 04/62] ocfs2: slot_map I/O based on max_slots Mark Fasheh
@ 2008-04-02 20:14         ` Mark Fasheh
  2008-04-02 20:14           ` [Ocfs2-devel] [PATCH 06/62] ocfs2: Define the contents of the slot_map file Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The in-memory slot map uses the same magic as the on-disk one.  There is
a special value to mark a slot as invalid.  It relies on the size of
certain types and so on.

Write a new in-memory map that keeps validity as a separate field.  Outside
of the I/O functions, OCFS2_INVALID_SLOT now means what it is supposed to.
It also is no longer tied to the type size.

This also means that only the I/O functions refer to 16bit quantities.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/journal.c  |    2 +-
 fs/ocfs2/ocfs2.h    |    6 +-
 fs/ocfs2/slot_map.c |  130 ++++++++++++++++++++++++++++-----------------------
 fs/ocfs2/slot_map.h |    2 +-
 4 files changed, 77 insertions(+), 63 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index ca4c0ea..bffd2d7 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -71,7 +71,7 @@ static int ocfs2_commit_thread(void *arg);
  */
 
 struct ocfs2_recovery_map {
-	int rm_used;
+	unsigned int rm_used;
 	unsigned int *rm_entries;
 };
 
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index c6ed8c3..95f783d 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -216,10 +216,10 @@ struct ocfs2_super
 	unsigned long s_mount_opt;
 	unsigned int s_atime_quantum;
 
-	u16 max_slots;
+	unsigned int max_slots;
 	s16 node_num;
-	s16 slot_num;
-	s16 preferred_slot;
+	int slot_num;
+	int preferred_slot;
 	int s_sectsize_bits;
 	int s_clustersize;
 	int s_clustersize_bits;
diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index 5bddee1..65a61bf 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -42,21 +42,41 @@
 
 #include "buffer_head_io.h"
 
+
+struct ocfs2_slot {
+	int sl_valid;
+	unsigned int sl_node_num;
+};
+
 struct ocfs2_slot_info {
 	struct inode *si_inode;
 	unsigned int si_blocks;
 	struct buffer_head **si_bh;
 	unsigned int si_num_slots;
-	unsigned int si_size;
-	s16 si_global_node_nums[OCFS2_MAX_SLOTS];
+	struct ocfs2_slot *si_slots;
 };
 
 
-static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
-				    s16 global);
-static void __ocfs2_fill_slot(struct ocfs2_slot_info *si,
-			      s16 slot_num,
-			      s16 node_num);
+static int __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
+				    unsigned int node_num);
+
+static void ocfs2_invalidate_slot(struct ocfs2_slot_info *si,
+				  int slot_num)
+{
+	BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots));
+	si->si_slots[slot_num].sl_valid = 0;
+}
+
+static void ocfs2_set_slot(struct ocfs2_slot_info *si,
+			   int slot_num, unsigned int node_num)
+{
+	BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots));
+	BUG_ON((node_num == O2NM_INVALID_NODE_NUM) ||
+	       (node_num >= O2NM_MAX_NODES));
+
+	si->si_slots[slot_num].sl_valid = 1;
+	si->si_slots[slot_num].sl_node_num = node_num;
+}
 
 /*
  * Post the slot information on disk into our slot_info struct.
@@ -71,8 +91,12 @@ static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 	 * should've made sure we have the most recent copy. */
 	disk_info = (__le16 *) si->si_bh[0]->b_data;
 
-	for (i = 0; i < si->si_size; i++)
-		si->si_global_node_nums[i] = le16_to_cpu(disk_info[i]);
+	for (i = 0; i < si->si_num_slots; i++) {
+		if (le16_to_cpu(disk_info[i]) == (u16)OCFS2_INVALID_SLOT)
+			ocfs2_invalidate_slot(si, i);
+		else
+			ocfs2_set_slot(si, i, le16_to_cpu(disk_info[i]));
+	}
 }
 
 int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
@@ -114,8 +138,13 @@ static int ocfs2_update_disk_slots(struct ocfs2_super *osb,
 	__le16 *disk_info = (__le16 *) si->si_bh[0]->b_data;
 
 	spin_lock(&osb->osb_lock);
-	for (i = 0; i < si->si_size; i++)
-		disk_info[i] = cpu_to_le16(si->si_global_node_nums[i]);
+	for (i = 0; i < si->si_num_slots; i++) {
+		if (si->si_slots[i].sl_valid)
+			disk_info[i] =
+				cpu_to_le16(si->si_slots[i].sl_node_num);
+		else
+			disk_info[i] = cpu_to_le16(OCFS2_INVALID_SLOT);
+	}
 	spin_unlock(&osb->osb_lock);
 
 	status = ocfs2_write_block(osb, si->si_bh[0], si->si_inode);
@@ -147,39 +176,39 @@ static int ocfs2_slot_map_physical_size(struct ocfs2_super *osb,
 	return 0;
 }
 
-/* try to find global node in the slot info. Returns
- * OCFS2_INVALID_SLOT if nothing is found. */
-static s16 __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
-				    s16 global)
+/* try to find global node in the slot info. Returns -ENOENT
+ * if nothing is found. */
+static int __ocfs2_node_num_to_slot(struct ocfs2_slot_info *si,
+				    unsigned int node_num)
 {
-	int i;
-	s16 ret = OCFS2_INVALID_SLOT;
+	int i, ret = -ENOENT;
 
 	for(i = 0; i < si->si_num_slots; i++) {
-		if (global == si->si_global_node_nums[i]) {
-			ret = (s16) i;
+		if (si->si_slots[i].sl_valid &&
+		    (node_num == si->si_slots[i].sl_node_num)) {
+			ret = i;
 			break;
 		}
 	}
+
 	return ret;
 }
 
-static s16 __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
-				   s16 preferred)
+static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si,
+				   int preferred)
 {
-	int i;
-	s16 ret = OCFS2_INVALID_SLOT;
+	int i, ret = -ENOSPC;
 
-	if (preferred >= 0 && preferred < si->si_num_slots) {
-		if (OCFS2_INVALID_SLOT == si->si_global_node_nums[preferred]) {
+	if ((preferred >= 0) && (preferred < si->si_num_slots)) {
+		if (!si->si_slots[preferred].sl_valid) {
 			ret = preferred;
 			goto out;
 		}
 	}
 
 	for(i = 0; i < si->si_num_slots; i++) {
-		if (OCFS2_INVALID_SLOT == si->si_global_node_nums[i]) {
-			ret = (s16) i;
+		if (!si->si_slots[i].sl_valid) {
+			ret = i;
 			break;
 		}
 	}
@@ -189,16 +218,13 @@ out:
 
 int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num)
 {
-	s16 slot;
+	int slot;
 	struct ocfs2_slot_info *si = osb->slot_info;
 
 	spin_lock(&osb->osb_lock);
 	slot = __ocfs2_node_num_to_slot(si, node_num);
 	spin_unlock(&osb->osb_lock);
 
-	if (slot == OCFS2_INVALID_SLOT)
-		return -ENOENT;
-
 	return slot;
 }
 
@@ -212,10 +238,10 @@ int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num,
 	BUG_ON(slot_num < 0);
 	BUG_ON(slot_num > osb->max_slots);
 
-	if (si->si_global_node_nums[slot_num] == OCFS2_INVALID_SLOT)
+	if (!si->si_slots[slot_num].sl_valid)
 		return -ENOENT;
 
-	*node_num = si->si_global_node_nums[slot_num];
+	*node_num = si->si_slots[slot_num].sl_node_num;
 	return 0;
 }
 
@@ -241,19 +267,7 @@ static void __ocfs2_free_slot_info(struct ocfs2_slot_info *si)
 	kfree(si);
 }
 
-static void __ocfs2_fill_slot(struct ocfs2_slot_info *si,
-			      s16 slot_num,
-			      s16 node_num)
-{
-	BUG_ON(slot_num == OCFS2_INVALID_SLOT);
-	BUG_ON(slot_num >= si->si_num_slots);
-	BUG_ON((node_num != O2NM_INVALID_NODE_NUM) &&
-	       (node_num >= O2NM_MAX_NODES));
-
-	si->si_global_node_nums[slot_num] = node_num;
-}
-
-int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num)
+int ocfs2_clear_slot(struct ocfs2_super *osb, int slot_num)
 {
 	struct ocfs2_slot_info *si = osb->slot_info;
 
@@ -261,7 +275,7 @@ int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num)
 		return 0;
 
 	spin_lock(&osb->osb_lock);
-	__ocfs2_fill_slot(si, slot_num, OCFS2_INVALID_SLOT);
+	ocfs2_invalidate_slot(si, slot_num);
 	spin_unlock(&osb->osb_lock);
 
 	return ocfs2_update_disk_slots(osb, osb->slot_info);
@@ -324,11 +338,13 @@ bail:
 
 int ocfs2_init_slot_info(struct ocfs2_super *osb)
 {
-	int status, i;
+	int status;
 	struct inode *inode = NULL;
 	struct ocfs2_slot_info *si;
 
-	si = kzalloc(sizeof(struct ocfs2_slot_info), GFP_KERNEL);
+	si = kzalloc(sizeof(struct ocfs2_slot_info) +
+		     (sizeof(struct ocfs2_slot) * osb->max_slots),
+		     GFP_KERNEL);
 	if (!si) {
 		status = -ENOMEM;
 		mlog_errno(status);
@@ -336,10 +352,8 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb)
 	}
 
 	si->si_num_slots = osb->max_slots;
-	si->si_size = OCFS2_MAX_SLOTS;
-
-	for(i = 0; i < si->si_num_slots; i++)
-		si->si_global_node_nums[i] = OCFS2_INVALID_SLOT;
+	si->si_slots = (struct ocfs2_slot *)((char *)si +
+					     sizeof(struct ocfs2_slot_info));
 
 	inode = ocfs2_get_system_file_inode(osb, SLOT_MAP_SYSTEM_INODE,
 					    OCFS2_INVALID_SLOT);
@@ -375,7 +389,7 @@ void ocfs2_free_slot_info(struct ocfs2_super *osb)
 int ocfs2_find_slot(struct ocfs2_super *osb)
 {
 	int status;
-	s16 slot;
+	int slot;
 	struct ocfs2_slot_info *si;
 
 	mlog_entry_void();
@@ -390,11 +404,11 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 	 * own journal recovery? Possibly not, though we certainly
 	 * need to warn to the user */
 	slot = __ocfs2_node_num_to_slot(si, osb->node_num);
-	if (slot == OCFS2_INVALID_SLOT) {
+	if (slot < 0) {
 		/* if no slot yet, then just take 1st available
 		 * one. */
 		slot = __ocfs2_find_empty_slot(si, osb->preferred_slot);
-		if (slot == OCFS2_INVALID_SLOT) {
+		if (slot < 0) {
 			spin_unlock(&osb->osb_lock);
 			mlog(ML_ERROR, "no free slots available!\n");
 			status = -EINVAL;
@@ -404,7 +418,7 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 		mlog(ML_NOTICE, "slot %d is already allocated to this node!\n",
 		     slot);
 
-	__ocfs2_fill_slot(si, slot, osb->node_num);
+	ocfs2_set_slot(si, slot, osb->node_num);
 	osb->slot_num = slot;
 	spin_unlock(&osb->osb_lock);
 
@@ -430,7 +444,7 @@ void ocfs2_put_slot(struct ocfs2_super *osb)
 	spin_lock(&osb->osb_lock);
 	ocfs2_update_slot_info(si);
 
-	__ocfs2_fill_slot(si, osb->slot_num, OCFS2_INVALID_SLOT);
+	ocfs2_invalidate_slot(si, osb->slot_num);
 	osb->slot_num = OCFS2_INVALID_SLOT;
 	spin_unlock(&osb->osb_lock);
 
diff --git a/fs/ocfs2/slot_map.h b/fs/ocfs2/slot_map.h
index 5118e89..601c95f 100644
--- a/fs/ocfs2/slot_map.h
+++ b/fs/ocfs2/slot_map.h
@@ -39,6 +39,6 @@ int ocfs2_node_num_to_slot(struct ocfs2_super *osb, unsigned int node_num);
 int ocfs2_slot_to_node_num_locked(struct ocfs2_super *osb, int slot_num,
 				  unsigned int *node_num);
 
-int ocfs2_clear_slot(struct ocfs2_super *osb, s16 slot_num);
+int ocfs2_clear_slot(struct ocfs2_super *osb, int slot_num);
 
 #endif
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 06/62] ocfs2: Define the contents of the slot_map file.
  2008-04-02 20:14         ` [Ocfs2-devel] [PATCH 05/62] ocfs2: De-magic the in-memory slot map Mark Fasheh
@ 2008-04-02 20:14           ` Mark Fasheh
  2008-04-02 20:14             ` [Ocfs2-devel] [PATCH 07/62] ocfs2: New slot map format Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The slot map file is merely an array of __le16.  Wrap it in a structure for
cleaner reference.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/ocfs2_fs.h |   12 ++++++++++++
 fs/ocfs2/slot_map.c |   15 ++++++++-------
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 3633edd..3299116 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -475,6 +475,18 @@ struct ocfs2_extent_block
 };
 
 /*
+ * On disk slot map for OCFS2.  This defines the contents of the "slot_map"
+ * system file.
+ */
+struct ocfs2_slot_map {
+/*00*/	__le16 sm_slots[0];
+/*
+ * Actual on-disk size is one block.  OCFS2_MAX_SLOTS is 255,
+ * 255 * sizeof(__le16) == 512B, within the 512B block minimum blocksize.
+ */
+};
+
+/*
  * On disk superblock for OCFS2
  * Note that it is contained inside an ocfs2_dinode, so all offsets
  * are relative to the start of ocfs2_dinode.id2.
diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index 65a61bf..e7e7a74 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -85,17 +85,17 @@ static void ocfs2_set_slot(struct ocfs2_slot_info *si,
 static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 {
 	int i;
-	__le16 *disk_info;
+	struct ocfs2_slot_map *sm;
 
 	/* we don't read the slot block here as ocfs2_super_lock
 	 * should've made sure we have the most recent copy. */
-	disk_info = (__le16 *) si->si_bh[0]->b_data;
+	sm = (struct ocfs2_slot_map *)si->si_bh[0]->b_data;
 
 	for (i = 0; i < si->si_num_slots; i++) {
-		if (le16_to_cpu(disk_info[i]) == (u16)OCFS2_INVALID_SLOT)
+		if (le16_to_cpu(sm->sm_slots[i]) == (u16)OCFS2_INVALID_SLOT)
 			ocfs2_invalidate_slot(si, i);
 		else
-			ocfs2_set_slot(si, i, le16_to_cpu(disk_info[i]));
+			ocfs2_set_slot(si, i, le16_to_cpu(sm->sm_slots[i]));
 	}
 }
 
@@ -135,15 +135,16 @@ static int ocfs2_update_disk_slots(struct ocfs2_super *osb,
 				   struct ocfs2_slot_info *si)
 {
 	int status, i;
-	__le16 *disk_info = (__le16 *) si->si_bh[0]->b_data;
+	struct ocfs2_slot_map *sm;
 
 	spin_lock(&osb->osb_lock);
+	sm = (struct ocfs2_slot_map *)si->si_bh[0]->b_data;
 	for (i = 0; i < si->si_num_slots; i++) {
 		if (si->si_slots[i].sl_valid)
-			disk_info[i] =
+			sm->sm_slots[i] =
 				cpu_to_le16(si->si_slots[i].sl_node_num);
 		else
-			disk_info[i] = cpu_to_le16(OCFS2_INVALID_SLOT);
+			sm->sm_slots[i] = cpu_to_le16(OCFS2_INVALID_SLOT);
 	}
 	spin_unlock(&osb->osb_lock);
 
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 07/62] ocfs2: New slot map format
  2008-04-02 20:14           ` [Ocfs2-devel] [PATCH 06/62] ocfs2: Define the contents of the slot_map file Mark Fasheh
@ 2008-04-02 20:14             ` Mark Fasheh
  2008-04-02 20:14               ` [Ocfs2-devel] [PATCH 08/62] ocfs2: Separate out dlm lock functions Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The old slot map had a few limitations:

- It was limited to one block, so the maximum slot count was 255.
- Each slot was signed 16bits, limiting node numbers to INT16_MAX.
- An empty slot was marked by the magic 0xFFFF (-1).

The new slot map format provides 32bit node numbers (UINT32_MAX), a
separate space to mark a slot in use, and extra room to grow.  The slot
map is now bounded by i_size, not a block.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/ocfs2.h    |    7 +++
 fs/ocfs2/ocfs2_fs.h |   31 +++++++++++++-
 fs/ocfs2/slot_map.c |  110 +++++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 133 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 95f783d..f78e9ed 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -374,6 +374,13 @@ static inline int ocfs2_mount_local(struct ocfs2_super *osb)
 	return (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT);
 }
 
+static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb)
+{
+	return (osb->s_feature_incompat &
+		OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP);
+}
+
+
 #define OCFS2_IS_VALID_DINODE(ptr)					\
 	(!strcmp((ptr)->i_signature, OCFS2_INODE_SIGNATURE))
 
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index 3299116..c495023 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -88,7 +88,8 @@
 #define OCFS2_FEATURE_COMPAT_SUPP	OCFS2_FEATURE_COMPAT_BACKUP_SB
 #define OCFS2_FEATURE_INCOMPAT_SUPP	(OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT \
 					 | OCFS2_FEATURE_INCOMPAT_SPARSE_ALLOC \
-					 | OCFS2_FEATURE_INCOMPAT_INLINE_DATA)
+					 | OCFS2_FEATURE_INCOMPAT_INLINE_DATA \
+					 | OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP)
 #define OCFS2_FEATURE_RO_COMPAT_SUPP	OCFS2_FEATURE_RO_COMPAT_UNWRITTEN
 
 /*
@@ -125,6 +126,10 @@
 /* Support for data packed into inode blocks */
 #define OCFS2_FEATURE_INCOMPAT_INLINE_DATA	0x0040
 
+/* Support for the extended slot map */
+#define OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP 0x100
+
+
 /*
  * backup superblock flag is used to indicate that this volume
  * has backup superblocks.
@@ -476,7 +481,8 @@ struct ocfs2_extent_block
 
 /*
  * On disk slot map for OCFS2.  This defines the contents of the "slot_map"
- * system file.
+ * system file.  A slot is valid if it contains a node number >= 0.  The
+ * value -1 (0xFFFF) is OCFS2_INVALID_SLOT.  This marks a slot empty.
  */
 struct ocfs2_slot_map {
 /*00*/	__le16 sm_slots[0];
@@ -486,6 +492,27 @@ struct ocfs2_slot_map {
  */
 };
 
+struct ocfs2_extended_slot {
+/*00*/	__u8	es_valid;
+	__u8	es_reserved1[3];
+	__le32	es_node_num;
+/*10*/
+};
+
+/*
+ * The extended slot map, used when OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP
+ * is set.  It separates out the valid marker from the node number, and
+ * has room to grow.  Unlike the old slot map, this format is defined by
+ * i_size.
+ */
+struct ocfs2_slot_map_extended {
+/*00*/	struct ocfs2_extended_slot se_slots[0];
+/*
+ * Actual size is i_size of the slot_map system file.  It should
+ * match s_max_slots * sizeof(struct ocfs2_extended_slot)
+ */
+};
+
 /*
  * On disk superblock for OCFS2
  * Note that it is contained inside an ocfs2_dinode, so all offsets
diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index e7e7a74..63fb1b2 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -49,6 +49,8 @@ struct ocfs2_slot {
 };
 
 struct ocfs2_slot_info {
+	int si_extended;
+	int si_slots_per_block;
 	struct inode *si_inode;
 	unsigned int si_blocks;
 	struct buffer_head **si_bh;
@@ -78,17 +80,37 @@ static void ocfs2_set_slot(struct ocfs2_slot_info *si,
 	si->si_slots[slot_num].sl_node_num = node_num;
 }
 
+/* This version is for the extended slot map */
+static void ocfs2_update_slot_info_extended(struct ocfs2_slot_info *si)
+{
+	int b, i, slotno;
+	struct ocfs2_slot_map_extended *se;
+
+	slotno = 0;
+	for (b = 0; b < si->si_blocks; b++) {
+		se = (struct ocfs2_slot_map_extended *)si->si_bh[b]->b_data;
+		for (i = 0;
+		     (i < si->si_slots_per_block) &&
+		     (slotno < si->si_num_slots);
+		     i++, slotno++) {
+			if (se->se_slots[i].es_valid)
+				ocfs2_set_slot(si, slotno,
+					       le32_to_cpu(se->se_slots[i].es_node_num));
+			else
+				ocfs2_invalidate_slot(si, slotno);
+		}
+	}
+}
+
 /*
  * Post the slot information on disk into our slot_info struct.
  * Must be protected by osb_lock.
  */
-static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
+static void ocfs2_update_slot_info_old(struct ocfs2_slot_info *si)
 {
 	int i;
 	struct ocfs2_slot_map *sm;
 
-	/* we don't read the slot block here as ocfs2_super_lock
-	 * should've made sure we have the most recent copy. */
 	sm = (struct ocfs2_slot_map *)si->si_bh[0]->b_data;
 
 	for (i = 0; i < si->si_num_slots; i++) {
@@ -99,6 +121,18 @@ static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
 	}
 }
 
+static void ocfs2_update_slot_info(struct ocfs2_slot_info *si)
+{
+	/*
+	 * The slot data will have been refreshed when ocfs2_super_lock
+	 * was taken.
+	 */
+	if (si->si_extended)
+		ocfs2_update_slot_info_extended(si);
+	else
+		ocfs2_update_slot_info_old(si);
+}
+
 int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
 {
 	int ret;
@@ -131,13 +165,31 @@ int ocfs2_refresh_slot_info(struct ocfs2_super *osb)
 
 /* post the our slot info stuff into it's destination bh and write it
  * out. */
-static int ocfs2_update_disk_slots(struct ocfs2_super *osb,
-				   struct ocfs2_slot_info *si)
+static void ocfs2_update_disk_slot_extended(struct ocfs2_slot_info *si,
+					    int slot_num,
+					    struct buffer_head **bh)
+{
+	int blkind = slot_num / si->si_slots_per_block;
+	int slotno = slot_num % si->si_slots_per_block;
+	struct ocfs2_slot_map_extended *se;
+
+	BUG_ON(blkind >= si->si_blocks);
+
+	se = (struct ocfs2_slot_map_extended *)si->si_bh[blkind]->b_data;
+	se->se_slots[slotno].es_valid = si->si_slots[slot_num].sl_valid;
+	if (si->si_slots[slot_num].sl_valid)
+		se->se_slots[slotno].es_node_num =
+			cpu_to_le32(si->si_slots[slot_num].sl_node_num);
+	*bh = si->si_bh[blkind];
+}
+
+static void ocfs2_update_disk_slot_old(struct ocfs2_slot_info *si,
+				       int slot_num,
+				       struct buffer_head **bh)
 {
-	int status, i;
+	int i;
 	struct ocfs2_slot_map *sm;
 
-	spin_lock(&osb->osb_lock);
 	sm = (struct ocfs2_slot_map *)si->si_bh[0]->b_data;
 	for (i = 0; i < si->si_num_slots; i++) {
 		if (si->si_slots[i].sl_valid)
@@ -146,9 +198,24 @@ static int ocfs2_update_disk_slots(struct ocfs2_super *osb,
 		else
 			sm->sm_slots[i] = cpu_to_le16(OCFS2_INVALID_SLOT);
 	}
+	*bh = si->si_bh[0];
+}
+
+static int ocfs2_update_disk_slot(struct ocfs2_super *osb,
+				  struct ocfs2_slot_info *si,
+				  int slot_num)
+{
+	int status;
+	struct buffer_head *bh;
+
+	spin_lock(&osb->osb_lock);
+	if (si->si_extended)
+		ocfs2_update_disk_slot_extended(si, slot_num, &bh);
+	else
+		ocfs2_update_disk_slot_old(si, slot_num, &bh);
 	spin_unlock(&osb->osb_lock);
 
-	status = ocfs2_write_block(osb, si->si_bh[0], si->si_inode);
+	status = ocfs2_write_block(osb, bh, si->si_inode);
 	if (status < 0)
 		mlog_errno(status);
 
@@ -165,7 +232,12 @@ static int ocfs2_slot_map_physical_size(struct ocfs2_super *osb,
 {
 	unsigned long long bytes_needed;
 
-	bytes_needed = osb->max_slots * sizeof(__le16);
+	if (ocfs2_uses_extended_slot_map(osb)) {
+		bytes_needed = osb->max_slots *
+			sizeof(struct ocfs2_extended_slot);
+	} else {
+		bytes_needed = osb->max_slots * sizeof(__le16);
+	}
 	if (bytes_needed > i_size_read(inode)) {
 		mlog(ML_ERROR,
 		     "Slot map file is too small!  (size %llu, needed %llu)\n",
@@ -279,7 +351,7 @@ int ocfs2_clear_slot(struct ocfs2_super *osb, int slot_num)
 	ocfs2_invalidate_slot(si, slot_num);
 	spin_unlock(&osb->osb_lock);
 
-	return ocfs2_update_disk_slots(osb, osb->slot_info);
+	return ocfs2_update_disk_slot(osb, osb->slot_info, slot_num);
 }
 
 static int ocfs2_map_slot_buffers(struct ocfs2_super *osb,
@@ -301,6 +373,16 @@ static int ocfs2_map_slot_buffers(struct ocfs2_super *osb,
 	if (!si->si_blocks)
 		goto bail;
 
+	if (si->si_extended)
+		si->si_slots_per_block =
+			(osb->sb->s_blocksize /
+			 sizeof(struct ocfs2_extended_slot));
+	else
+		si->si_slots_per_block = osb->sb->s_blocksize / sizeof(__le16);
+
+	/* The size checks above should ensure this */
+	BUG_ON((osb->max_slots / si->si_slots_per_block) > blocks);
+
 	mlog(0, "Slot map needs %u buffers for %llu bytes\n",
 	     si->si_blocks, bytes);
 
@@ -352,6 +434,7 @@ int ocfs2_init_slot_info(struct ocfs2_super *osb)
 		goto bail;
 	}
 
+	si->si_extended = ocfs2_uses_extended_slot_map(osb);
 	si->si_num_slots = osb->max_slots;
 	si->si_slots = (struct ocfs2_slot *)((char *)si +
 					     sizeof(struct ocfs2_slot_info));
@@ -425,7 +508,7 @@ int ocfs2_find_slot(struct ocfs2_super *osb)
 
 	mlog(0, "taking node slot %d\n", osb->slot_num);
 
-	status = ocfs2_update_disk_slots(osb, si);
+	status = ocfs2_update_disk_slot(osb, si, osb->slot_num);
 	if (status < 0)
 		mlog_errno(status);
 
@@ -436,7 +519,7 @@ bail:
 
 void ocfs2_put_slot(struct ocfs2_super *osb)
 {
-	int status;
+	int status, slot_num;
 	struct ocfs2_slot_info *si = osb->slot_info;
 
 	if (!si)
@@ -445,11 +528,12 @@ void ocfs2_put_slot(struct ocfs2_super *osb)
 	spin_lock(&osb->osb_lock);
 	ocfs2_update_slot_info(si);
 
+	slot_num = osb->slot_num;
 	ocfs2_invalidate_slot(si, osb->slot_num);
 	osb->slot_num = OCFS2_INVALID_SLOT;
 	spin_unlock(&osb->osb_lock);
 
-	status = ocfs2_update_disk_slots(osb, si);
+	status = ocfs2_update_disk_slot(osb, si, slot_num);
 	if (status < 0) {
 		mlog_errno(status);
 		goto bail;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 08/62] ocfs2: Separate out dlm lock functions.
  2008-04-02 20:14             ` [Ocfs2-devel] [PATCH 07/62] ocfs2: New slot map format Mark Fasheh
@ 2008-04-02 20:14               ` Mark Fasheh
  2008-04-02 20:14                 ` [Ocfs2-devel] [PATCH 09/62] ocfs2: Use global DLM_ constants in generic code Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

This is the first in a series of patches to isolate ocfs2 from the
underlying cluster stack. Here we wrap the dlm locking functions with
ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
callbacks, we can eliminate the callbacks from the filesystem visible
functions.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/Makefile    |    1 +
 fs/ocfs2/dlmglue.c   |  110 +++++++++++++++++++++++++++----------------------
 fs/ocfs2/dlmglue.h   |    3 +
 fs/ocfs2/stackglue.c |   65 +++++++++++++++++++++++++++++
 fs/ocfs2/stackglue.h |   45 ++++++++++++++++++++
 fs/ocfs2/super.c     |    4 ++
 6 files changed, 179 insertions(+), 49 deletions(-)
 create mode 100644 fs/ocfs2/stackglue.c
 create mode 100644 fs/ocfs2/stackglue.h

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index 4d4ce48..3ba64af 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -24,6 +24,7 @@ ocfs2-objs := \
 	namei.o 		\
 	resize.o		\
 	slot_map.o 		\
+	stackglue.o		\
 	suballoc.o 		\
 	super.o 		\
 	symlink.o 		\
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 15a5167..aea3bef 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -53,6 +53,7 @@
 #include "heartbeat.h"
 #include "inode.h"
 #include "journal.h"
+#include "stackglue.h"
 #include "slot_map.h"
 #include "super.h"
 #include "uptodate.h"
@@ -888,22 +889,21 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 	lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
-	status = dlmlock(osb->dlm,
-			 level,
-			 &lockres->l_lksb,
-			 dlm_flags,
-			 lockres->l_name,
-			 OCFS2_LOCK_ID_MAX_LEN - 1,
-			 ocfs2_locking_ast,
-			 lockres,
-			 ocfs2_blocking_ast);
+	status = ocfs2_dlm_lock(osb->dlm,
+				level,
+				&lockres->l_lksb,
+				dlm_flags,
+				lockres->l_name,
+				OCFS2_LOCK_ID_MAX_LEN - 1,
+				lockres);
 	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("dlmlock", status, lockres);
+		ocfs2_log_dlm_error("ocfs2_dlm_lock", status, lockres);
 		ret = -EINVAL;
 		ocfs2_recover_from_dlm_error(lockres, 1);
 	}
 
-	mlog(0, "lock %s, successfull return from dlmlock\n", lockres->l_name);
+	mlog(0, "lock %s, successfull return from ocfs2_dlm_lock\n",
+	     lockres->l_name);
 
 bail:
 	mlog_exit(ret);
@@ -1091,29 +1091,27 @@ again:
 		     lockres->l_name, lockres->l_level, level);
 
 		/* call dlm_lock to upgrade lock now */
-		status = dlmlock(osb->dlm,
-				 level,
-				 &lockres->l_lksb,
-				 lkm_flags,
-				 lockres->l_name,
-				 OCFS2_LOCK_ID_MAX_LEN - 1,
-				 ocfs2_locking_ast,
-				 lockres,
-				 ocfs2_blocking_ast);
+		status = ocfs2_dlm_lock(osb->dlm,
+					level,
+					&lockres->l_lksb,
+					lkm_flags,
+					lockres->l_name,
+					OCFS2_LOCK_ID_MAX_LEN - 1,
+					lockres);
 		if (status != DLM_NORMAL) {
 			if ((lkm_flags & LKM_NOQUEUE) &&
 			    (status == DLM_NOTQUEUED))
 				ret = -EAGAIN;
 			else {
-				ocfs2_log_dlm_error("dlmlock", status,
-						    lockres);
+				ocfs2_log_dlm_error("ocfs2_dlm_lock",
+						    status, lockres);
 				ret = -EINVAL;
 			}
 			ocfs2_recover_from_dlm_error(lockres, 1);
 			goto out;
 		}
 
-		mlog(0, "lock %s, successfull return from dlmlock\n",
+		mlog(0, "lock %s, successfull return from ocfs2_dlm_lock\n",
 		     lockres->l_name);
 
 		/* At this point we've gone inside the dlm and need to
@@ -1503,14 +1501,14 @@ int ocfs2_file_lock(struct file *file, int ex, int trylock)
 	lockres_add_mask_waiter(lockres, &mw, OCFS2_LOCK_BUSY, 0);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
-	ret = dlmlock(osb->dlm, level, &lockres->l_lksb, lkm_flags,
-		      lockres->l_name, OCFS2_LOCK_ID_MAX_LEN - 1,
-		      ocfs2_locking_ast, lockres, ocfs2_blocking_ast);
+	ret = ocfs2_dlm_lock(osb->dlm, level, &lockres->l_lksb, lkm_flags,
+			     lockres->l_name, OCFS2_LOCK_ID_MAX_LEN - 1,
+			     lockres);
 	if (ret != DLM_NORMAL) {
 		if (trylock && ret == DLM_NOTQUEUED)
 			ret = -EAGAIN;
 		else {
-			ocfs2_log_dlm_error("dlmlock", ret, lockres);
+			ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres);
 			ret = -EINVAL;
 		}
 
@@ -2699,15 +2697,15 @@ static int ocfs2_drop_lock(struct ocfs2_super *osb,
 
 	mlog(0, "lock %s\n", lockres->l_name);
 
-	status = dlmunlock(osb->dlm, &lockres->l_lksb, lkm_flags,
-			   ocfs2_unlock_ast, lockres);
+	status = ocfs2_dlm_unlock(osb->dlm, &lockres->l_lksb, lkm_flags,
+				  lockres);
 	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("dlmunlock", status, lockres);
+		ocfs2_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
 		mlog(ML_ERROR, "lockres flags: %lu\n", lockres->l_flags);
 		dlm_print_one_lock(lockres->l_lksb.lockid);
 		BUG();
 	}
-	mlog(0, "lock %s, successfull return from dlmunlock\n",
+	mlog(0, "lock %s, successfull return from ocfs2_dlm_unlock\n",
 	     lockres->l_name);
 
 	ocfs2_wait_on_busy_lock(lockres);
@@ -2832,17 +2830,15 @@ static int ocfs2_downconvert_lock(struct ocfs2_super *osb,
 	if (lvb)
 		dlm_flags |= LKM_VALBLK;
 
-	status = dlmlock(osb->dlm,
-			 new_level,
-			 &lockres->l_lksb,
-			 dlm_flags,
-			 lockres->l_name,
-			 OCFS2_LOCK_ID_MAX_LEN - 1,
-			 ocfs2_locking_ast,
-			 lockres,
-			 ocfs2_blocking_ast);
+	status = ocfs2_dlm_lock(osb->dlm,
+				new_level,
+				&lockres->l_lksb,
+				dlm_flags,
+				lockres->l_name,
+				OCFS2_LOCK_ID_MAX_LEN - 1,
+				lockres);
 	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("dlmlock", status, lockres);
+		ocfs2_log_dlm_error("ocfs2_dlm_lock", status, lockres);
 		ret = -EINVAL;
 		ocfs2_recover_from_dlm_error(lockres, 1);
 		goto bail;
@@ -2854,7 +2850,7 @@ bail:
 	return ret;
 }
 
-/* returns 1 when the caller should unlock and call dlmunlock */
+/* returns 1 when the caller should unlock and call ocfs2_dlm_unlock */
 static int ocfs2_prepare_cancel_convert(struct ocfs2_super *osb,
 				        struct ocfs2_lock_res *lockres)
 {
@@ -2896,18 +2892,17 @@ static int ocfs2_cancel_convert(struct ocfs2_super *osb,
 	mlog(0, "lock %s\n", lockres->l_name);
 
 	ret = 0;
-	status = dlmunlock(osb->dlm,
-			   &lockres->l_lksb,
-			   LKM_CANCEL,
-			   ocfs2_unlock_ast,
-			   lockres);
+	status = ocfs2_dlm_unlock(osb->dlm,
+				  &lockres->l_lksb,
+				  LKM_CANCEL,
+				  lockres);
 	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("dlmunlock", status, lockres);
+		ocfs2_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
 		ret = -EINVAL;
 		ocfs2_recover_from_dlm_error(lockres, 0);
 	}
 
-	mlog(0, "lock %s return from dlmunlock\n", lockres->l_name);
+	mlog(0, "lock %s return from ocfs2_dlm_unlock\n", lockres->l_name);
 
 	mlog_exit(ret);
 	return ret;
@@ -3211,6 +3206,23 @@ static int ocfs2_dentry_convert_worker(struct ocfs2_lock_res *lockres,
 	return UNBLOCK_CONTINUE_POST;
 }
 
+static struct ocfs2_locking_protocol lproto = {
+	.lp_lock_ast		= ocfs2_locking_ast,
+	.lp_blocking_ast	= ocfs2_blocking_ast,
+	.lp_unlock_ast		= ocfs2_unlock_ast,
+};
+
+/* This interface isn't the final one, hence the less-than-perfect names */
+void dlmglue_init_stack(void)
+{
+	o2cb_get_stack(&lproto);
+}
+
+void dlmglue_exit_stack(void)
+{
+	o2cb_put_stack();
+}
+
 static void ocfs2_process_blocked_lock(struct ocfs2_super *osb,
 				       struct ocfs2_lock_res *lockres)
 {
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index e3cf902..3238043 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -114,5 +114,8 @@ void ocfs2_wake_downconvert_thread(struct ocfs2_super *osb);
 struct ocfs2_dlm_debug *ocfs2_new_dlm_debug(void);
 void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug *dlm_debug);
 
+void dlmglue_init_stack(void);
+void dlmglue_exit_stack(void);
+
 extern const struct dlm_protocol_version ocfs2_locking_protocol;
 #endif	/* DLMGLUE_H */
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
new file mode 100644
index 0000000..4f44f23
--- /dev/null
+++ b/fs/ocfs2/stackglue.c
@@ -0,0 +1,65 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * stackglue.c
+ *
+ * Code which implements an OCFS2 specific interface to underlying
+ * cluster stacks.
+ *
+ * Copyright (C) 2007 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/types.h>
+#include <linux/list.h>
+
+#include "dlm/dlmapi.h"
+
+#include "stackglue.h"
+
+static struct ocfs2_locking_protocol *lproto;
+
+enum dlm_status ocfs2_dlm_lock(struct dlm_ctxt *dlm,
+		   int mode,
+		   struct dlm_lockstatus *lksb,
+		   u32 flags,
+		   void *name,
+		   unsigned int namelen,
+		   void *astarg)
+{
+	BUG_ON(lproto == NULL);
+	return dlmlock(dlm, mode, lksb, flags, name, namelen,
+		       lproto->lp_lock_ast, astarg,
+		       lproto->lp_blocking_ast);
+}
+
+enum dlm_status ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
+		     struct dlm_lockstatus *lksb,
+		     u32 flags,
+		     void *astarg)
+{
+	BUG_ON(lproto == NULL);
+
+	return dlmunlock(dlm, lksb, flags, lproto->lp_unlock_ast, astarg);
+}
+
+
+void o2cb_get_stack(struct ocfs2_locking_protocol *proto)
+{
+	BUG_ON(proto == NULL);
+
+	lproto = proto;
+}
+
+void o2cb_put_stack(void)
+{
+	lproto = NULL;
+}
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
new file mode 100644
index 0000000..40a0024
--- /dev/null
+++ b/fs/ocfs2/stackglue.h
@@ -0,0 +1,45 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * stackglue.h
+ *
+ * Glue to the underlying cluster stack.
+ *
+ * Copyright (C) 2007 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+
+#ifndef STACKGLUE_H
+#define STACKGLUE_H
+
+struct ocfs2_locking_protocol {
+	void (*lp_lock_ast)(void *astarg);
+	void (*lp_blocking_ast)(void *astarg, int level);
+	void (*lp_unlock_ast)(void *astarg, enum dlm_status status);
+};
+
+enum dlm_status ocfs2_dlm_lock(struct dlm_ctxt *dlm,
+		   int mode,
+		   struct dlm_lockstatus *lksb,
+		   u32 flags,
+		   void *name,
+		   unsigned int namelen,
+		   void *astarg);
+enum dlm_status ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
+		     struct dlm_lockstatus *lksb,
+		     u32 flags,
+		     void *astarg);
+
+void o2cb_get_stack(struct ocfs2_locking_protocol *proto);
+void o2cb_put_stack(void);
+
+#endif  /* STACKGLUE_H */
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 1a4c7c7..c867546 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -933,6 +933,8 @@ static int __init ocfs2_init(void)
 
 	ocfs2_print_version();
 
+	dlmglue_init_stack();
+
 	status = init_ocfs2_uptodate_cache();
 	if (status < 0) {
 		mlog_errno(status);
@@ -988,6 +990,8 @@ static void __exit ocfs2_exit(void)
 
 	exit_ocfs2_uptodate_cache();
 
+	dlmglue_exit_stack();
+
 	mlog_exit_void();
 }
 
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 09/62] ocfs2: Use global DLM_ constants in generic code.
  2008-04-02 20:14               ` [Ocfs2-devel] [PATCH 08/62] ocfs2: Separate out dlm lock functions Mark Fasheh
@ 2008-04-02 20:14                 ` Mark Fasheh
  2008-04-02 20:14                   ` [Ocfs2-devel] [PATCH 10/62] ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The ocfs2 generic code should use the values in <linux/dlmconstants.h>.
stackglue.c will convert them to o2dlm values.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |  140 +++++++++++++++++++++++++-------------------------
 fs/ocfs2/stackglue.c |   71 +++++++++++++++++++++++---
 fs/ocfs2/stackglue.h |   13 +++++
 3 files changed, 147 insertions(+), 77 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index aea3bef..b8ac903 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -37,8 +37,6 @@
 #include <cluster/nodemanager.h>
 #include <cluster/tcp.h>
 
-#include <dlm/dlmapi.h>
-
 #define MLOG_MASK_PREFIX ML_DLM_GLUE
 #include <cluster/masklog.h>
 
@@ -317,7 +315,7 @@ static inline struct ocfs2_super *ocfs2_get_lockres_osb(struct ocfs2_lock_res *l
 static int ocfs2_lock_create(struct ocfs2_super *osb,
 			     struct ocfs2_lock_res *lockres,
 			     int level,
-			     int dlm_flags);
+			     u32 dlm_flags);
 static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres,
 						     int wanted);
 static void ocfs2_cluster_unlock(struct ocfs2_super *osb,
@@ -407,9 +405,9 @@ static void ocfs2_lock_res_init_common(struct ocfs2_super *osb,
 	res->l_ops           = ops;
 	res->l_priv          = priv;
 
-	res->l_level         = LKM_IVMODE;
-	res->l_requested     = LKM_IVMODE;
-	res->l_blocking      = LKM_IVMODE;
+	res->l_level         = DLM_LOCK_IV;
+	res->l_requested     = DLM_LOCK_IV;
+	res->l_blocking      = DLM_LOCK_IV;
 	res->l_action        = OCFS2_AST_INVALID;
 	res->l_unlock_action = OCFS2_UNLOCK_INVALID;
 
@@ -605,10 +603,10 @@ static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
 	BUG_ON(!lockres);
 
 	switch(level) {
-	case LKM_EXMODE:
+	case DLM_LOCK_EX:
 		lockres->l_ex_holders++;
 		break;
-	case LKM_PRMODE:
+	case DLM_LOCK_PR:
 		lockres->l_ro_holders++;
 		break;
 	default:
@@ -626,11 +624,11 @@ static inline void ocfs2_dec_holders(struct ocfs2_lock_res *lockres,
 	BUG_ON(!lockres);
 
 	switch(level) {
-	case LKM_EXMODE:
+	case DLM_LOCK_EX:
 		BUG_ON(!lockres->l_ex_holders);
 		lockres->l_ex_holders--;
 		break;
-	case LKM_PRMODE:
+	case DLM_LOCK_PR:
 		BUG_ON(!lockres->l_ro_holders);
 		lockres->l_ro_holders--;
 		break;
@@ -645,12 +643,12 @@ static inline void ocfs2_dec_holders(struct ocfs2_lock_res *lockres,
  * lock types are added. */
 static inline int ocfs2_highest_compat_lock_level(int level)
 {
-	int new_level = LKM_EXMODE;
+	int new_level = DLM_LOCK_EX;
 
-	if (level == LKM_EXMODE)
-		new_level = LKM_NLMODE;
-	else if (level == LKM_PRMODE)
-		new_level = LKM_PRMODE;
+	if (level == DLM_LOCK_EX)
+		new_level = DLM_LOCK_NL;
+	else if (level == DLM_LOCK_PR)
+		new_level = DLM_LOCK_PR;
 	return new_level;
 }
 
@@ -689,12 +687,12 @@ static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res
 	BUG_ON(!(lockres->l_flags & OCFS2_LOCK_BUSY));
 	BUG_ON(!(lockres->l_flags & OCFS2_LOCK_ATTACHED));
 	BUG_ON(!(lockres->l_flags & OCFS2_LOCK_BLOCKED));
-	BUG_ON(lockres->l_blocking <= LKM_NLMODE);
+	BUG_ON(lockres->l_blocking <= DLM_LOCK_NL);
 
 	lockres->l_level = lockres->l_requested;
 	if (lockres->l_level <=
 	    ocfs2_highest_compat_lock_level(lockres->l_blocking)) {
-		lockres->l_blocking = LKM_NLMODE;
+		lockres->l_blocking = DLM_LOCK_NL;
 		lockres_clear_flags(lockres, OCFS2_LOCK_BLOCKED);
 	}
 	lockres_clear_flags(lockres, OCFS2_LOCK_BUSY);
@@ -713,7 +711,7 @@ static inline void ocfs2_generic_handle_convert_action(struct ocfs2_lock_res *lo
 	 * information is already up to data. Convert from NL to
 	 * *anything* however should mark ourselves as needing an
 	 * update */
-	if (lockres->l_level == LKM_NLMODE &&
+	if (lockres->l_level == DLM_LOCK_NL &&
 	    lockres->l_ops->flags & LOCK_TYPE_REQUIRES_REFRESH)
 		lockres_or_flags(lockres, OCFS2_LOCK_NEEDS_REFRESH);
 
@@ -730,7 +728,7 @@ static inline void ocfs2_generic_handle_attach_action(struct ocfs2_lock_res *loc
 	BUG_ON((!(lockres->l_flags & OCFS2_LOCK_BUSY)));
 	BUG_ON(lockres->l_flags & OCFS2_LOCK_ATTACHED);
 
-	if (lockres->l_requested > LKM_NLMODE &&
+	if (lockres->l_requested > DLM_LOCK_NL &&
 	    !(lockres->l_flags & OCFS2_LOCK_LOCAL) &&
 	    lockres->l_ops->flags & LOCK_TYPE_REQUIRES_REFRESH)
 		lockres_or_flags(lockres, OCFS2_LOCK_NEEDS_REFRESH);
@@ -775,7 +773,7 @@ static void ocfs2_blocking_ast(void *opaque, int level)
 	int needs_downconvert;
 	unsigned long flags;
 
-	BUG_ON(level <= LKM_NLMODE);
+	BUG_ON(level <= DLM_LOCK_NL);
 
 	mlog(0, "BAST fired for lockres %s, blocking %d, level %d type %s\n",
 	     lockres->l_name, level, lockres->l_level,
@@ -866,7 +864,7 @@ static inline void ocfs2_recover_from_dlm_error(struct ocfs2_lock_res *lockres,
 static int ocfs2_lock_create(struct ocfs2_super *osb,
 			     struct ocfs2_lock_res *lockres,
 			     int level,
-			     int dlm_flags)
+			     u32 dlm_flags)
 {
 	int ret = 0;
 	enum dlm_status status = DLM_NORMAL;
@@ -874,7 +872,7 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 
 	mlog_entry_void();
 
-	mlog(0, "lock %s, level = %d, flags = %d\n", lockres->l_name, level,
+	mlog(0, "lock %s, level = %d, flags = %u\n", lockres->l_name, level,
 	     dlm_flags);
 
 	spin_lock_irqsave(&lockres->l_lock, flags);
@@ -1016,7 +1014,7 @@ static int ocfs2_wait_for_mask_interruptible(struct ocfs2_mask_waiter *mw,
 static int ocfs2_cluster_lock(struct ocfs2_super *osb,
 			      struct ocfs2_lock_res *lockres,
 			      int level,
-			      int lkm_flags,
+			      u32 lkm_flags,
 			      int arg_flags)
 {
 	struct ocfs2_mask_waiter mw;
@@ -1030,7 +1028,7 @@ static int ocfs2_cluster_lock(struct ocfs2_super *osb,
 	ocfs2_init_mask_waiter(&mw);
 
 	if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
-		lkm_flags |= LKM_VALBLK;
+		lkm_flags |= DLM_LKF_VALBLK;
 
 again:
 	wait = 0;
@@ -1074,18 +1072,18 @@ again:
 
 		if (!(lockres->l_flags & OCFS2_LOCK_ATTACHED)) {
 			lockres->l_action = OCFS2_AST_ATTACH;
-			lkm_flags &= ~LKM_CONVERT;
+			lkm_flags &= ~DLM_LKF_CONVERT;
 		} else {
 			lockres->l_action = OCFS2_AST_CONVERT;
-			lkm_flags |= LKM_CONVERT;
+			lkm_flags |= DLM_LKF_CONVERT;
 		}
 
 		lockres->l_requested = level;
 		lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
 		spin_unlock_irqrestore(&lockres->l_lock, flags);
 
-		BUG_ON(level == LKM_IVMODE);
-		BUG_ON(level == LKM_NLMODE);
+		BUG_ON(level == DLM_LOCK_IV);
+		BUG_ON(level == DLM_LOCK_NL);
 
 		mlog(0, "lock %s, convert from %d to level = %d\n",
 		     lockres->l_name, lockres->l_level, level);
@@ -1099,7 +1097,7 @@ again:
 					OCFS2_LOCK_ID_MAX_LEN - 1,
 					lockres);
 		if (status != DLM_NORMAL) {
-			if ((lkm_flags & LKM_NOQUEUE) &&
+			if ((lkm_flags & DLM_LKF_NOQUEUE) &&
 			    (status == DLM_NOTQUEUED))
 				ret = -EAGAIN;
 			else {
@@ -1175,9 +1173,9 @@ static int ocfs2_create_new_lock(struct ocfs2_super *osb,
 				 int ex,
 				 int local)
 {
-	int level =  ex ? LKM_EXMODE : LKM_PRMODE;
+	int level =  ex ? DLM_LOCK_EX : DLM_LOCK_PR;
 	unsigned long flags;
-	int lkm_flags = local ? LKM_LOCAL : 0;
+	u32 lkm_flags = local ? DLM_LKF_LOCAL : 0;
 
 	spin_lock_irqsave(&lockres->l_lock, flags);
 	BUG_ON(lockres->l_flags & OCFS2_LOCK_ATTACHED);
@@ -1220,7 +1218,7 @@ int ocfs2_create_new_inode_locks(struct inode *inode)
 	}
 
 	/*
-	 * We don't want to use LKM_LOCAL on a meta data lock as they
+	 * We don't want to use DLM_LKF_LOCAL on a meta data lock as they
 	 * don't use a generation in their lock names.
 	 */
 	ret = ocfs2_create_new_lock(osb, &OCFS2_I(inode)->ip_inode_lockres, 1, 0);
@@ -1259,7 +1257,7 @@ int ocfs2_rw_lock(struct inode *inode, int write)
 
 	lockres = &OCFS2_I(inode)->ip_rw_lockres;
 
-	level = write ? LKM_EXMODE : LKM_PRMODE;
+	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
 
 	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres, level, 0,
 				    0);
@@ -1272,7 +1270,7 @@ int ocfs2_rw_lock(struct inode *inode, int write)
 
 void ocfs2_rw_unlock(struct inode *inode, int write)
 {
-	int level = write ? LKM_EXMODE : LKM_PRMODE;
+	int level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
 	struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_rw_lockres;
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
@@ -1310,7 +1308,7 @@ int ocfs2_open_lock(struct inode *inode)
 	lockres = &OCFS2_I(inode)->ip_open_lockres;
 
 	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres,
-				    LKM_PRMODE, 0, 0);
+				    DLM_LOCK_PR, 0, 0);
 	if (status < 0)
 		mlog_errno(status);
 
@@ -1338,16 +1336,16 @@ int ocfs2_try_open_lock(struct inode *inode, int write)
 
 	lockres = &OCFS2_I(inode)->ip_open_lockres;
 
-	level = write ? LKM_EXMODE : LKM_PRMODE;
+	level = write ? DLM_LOCK_EX : DLM_LOCK_PR;
 
 	/*
 	 * The file system may already holding a PRMODE/EXMODE open lock.
-	 * Since we pass LKM_NOQUEUE, the request won't block waiting on
+	 * Since we pass DLM_LKF_NOQUEUE, the request won't block waiting on
 	 * other nodes and the -EAGAIN will indicate to the caller that
 	 * this inode is still in use.
 	 */
 	status = ocfs2_cluster_lock(OCFS2_SB(inode->i_sb), lockres,
-				    level, LKM_NOQUEUE, 0);
+				    level, DLM_LKF_NOQUEUE, 0);
 
 out:
 	mlog_exit(status);
@@ -1372,10 +1370,10 @@ void ocfs2_open_unlock(struct inode *inode)
 
 	if(lockres->l_ro_holders)
 		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
-				     LKM_PRMODE);
+				     DLM_LOCK_PR);
 	if(lockres->l_ex_holders)
 		ocfs2_cluster_unlock(OCFS2_SB(inode->i_sb), lockres,
-				     LKM_EXMODE);
+				     DLM_LOCK_EX);
 
 out:
 	mlog_exit_void();
@@ -1462,7 +1460,7 @@ int ocfs2_file_lock(struct file *file, int ex, int trylock)
 	ocfs2_init_mask_waiter(&mw);
 
 	if ((lockres->l_flags & OCFS2_LOCK_BUSY) ||
-	    (lockres->l_level > LKM_NLMODE)) {
+	    (lockres->l_level > DLM_LOCK_NL)) {
 		mlog(ML_ERROR,
 		     "File lock \"%s\" has busy or locked state: flags: 0x%lx, "
 		     "level: %u\n", lockres->l_name, lockres->l_flags,
@@ -1570,7 +1568,7 @@ void ocfs2_file_unlock(struct file *file)
 	 * Fake a blocking ast for the downconvert code.
 	 */
 	lockres_or_flags(lockres, OCFS2_LOCK_BLOCKED);
-	lockres->l_blocking = LKM_EXMODE;
+	lockres->l_blocking = DLM_LOCK_EX;
 
 	ocfs2_prepare_downconvert(lockres, LKM_NLMODE);
 	lockres_add_mask_waiter(lockres, &mw, OCFS2_LOCK_BUSY, 0);
@@ -1599,11 +1597,11 @@ static void ocfs2_downconvert_on_unlock(struct ocfs2_super *osb,
 	 * condition. */
 	if (lockres->l_flags & OCFS2_LOCK_BLOCKED) {
 		switch(lockres->l_blocking) {
-		case LKM_EXMODE:
+		case DLM_LOCK_EX:
 			if (!lockres->l_ex_holders && !lockres->l_ro_holders)
 				kick = 1;
 			break;
-		case LKM_PRMODE:
+		case DLM_LOCK_PR:
 			if (!lockres->l_ex_holders)
 				kick = 1;
 			break;
@@ -1921,7 +1919,8 @@ int ocfs2_inode_lock_full(struct inode *inode,
 			 int ex,
 			 int arg_flags)
 {
-	int status, level, dlm_flags, acquired;
+	int status, level, acquired;
+	u32 dlm_flags;
 	struct ocfs2_lock_res *lockres = NULL;
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 	struct buffer_head *local_bh = NULL;
@@ -1951,10 +1950,10 @@ int ocfs2_inode_lock_full(struct inode *inode,
 		ocfs2_wait_for_recovery(osb);
 
 	lockres = &OCFS2_I(inode)->ip_inode_lockres;
-	level = ex ? LKM_EXMODE : LKM_PRMODE;
+	level = ex ? DLM_LOCK_EX : DLM_LOCK_PR;
 	dlm_flags = 0;
 	if (arg_flags & OCFS2_META_LOCK_NOQUEUE)
-		dlm_flags |= LKM_NOQUEUE;
+		dlm_flags |= DLM_LKF_NOQUEUE;
 
 	status = ocfs2_cluster_lock(osb, lockres, level, dlm_flags, arg_flags);
 	if (status < 0) {
@@ -2105,7 +2104,7 @@ int ocfs2_inode_lock_atime(struct inode *inode,
 void ocfs2_inode_unlock(struct inode *inode,
 		       int ex)
 {
-	int level = ex ? LKM_EXMODE : LKM_PRMODE;
+	int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR;
 	struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
@@ -2126,7 +2125,7 @@ int ocfs2_super_lock(struct ocfs2_super *osb,
 		     int ex)
 {
 	int status = 0;
-	int level = ex ? LKM_EXMODE : LKM_PRMODE;
+	int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR;
 	struct ocfs2_lock_res *lockres = &osb->osb_super_lockres;
 
 	mlog_entry_void();
@@ -2168,7 +2167,7 @@ bail:
 void ocfs2_super_unlock(struct ocfs2_super *osb,
 			int ex)
 {
-	int level = ex ? LKM_EXMODE : LKM_PRMODE;
+	int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR;
 	struct ocfs2_lock_res *lockres = &osb->osb_super_lockres;
 
 	if (!ocfs2_mount_local(osb))
@@ -2186,7 +2185,7 @@ int ocfs2_rename_lock(struct ocfs2_super *osb)
 	if (ocfs2_mount_local(osb))
 		return 0;
 
-	status = ocfs2_cluster_lock(osb, lockres, LKM_EXMODE, 0, 0);
+	status = ocfs2_cluster_lock(osb, lockres, DLM_LOCK_EX, 0, 0);
 	if (status < 0)
 		mlog_errno(status);
 
@@ -2198,13 +2197,13 @@ void ocfs2_rename_unlock(struct ocfs2_super *osb)
 	struct ocfs2_lock_res *lockres = &osb->osb_rename_lockres;
 
 	if (!ocfs2_mount_local(osb))
-		ocfs2_cluster_unlock(osb, lockres, LKM_EXMODE);
+		ocfs2_cluster_unlock(osb, lockres, DLM_LOCK_EX);
 }
 
 int ocfs2_dentry_lock(struct dentry *dentry, int ex)
 {
 	int ret;
-	int level = ex ? LKM_EXMODE : LKM_PRMODE;
+	int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR;
 	struct ocfs2_dentry_lock *dl = dentry->d_fsdata;
 	struct ocfs2_super *osb = OCFS2_SB(dentry->d_sb);
 
@@ -2225,7 +2224,7 @@ int ocfs2_dentry_lock(struct dentry *dentry, int ex)
 
 void ocfs2_dentry_unlock(struct dentry *dentry, int ex)
 {
-	int level = ex ? LKM_EXMODE : LKM_PRMODE;
+	int level = ex ? DLM_LOCK_EX : DLM_LOCK_PR;
 	struct ocfs2_dentry_lock *dl = dentry->d_fsdata;
 	struct ocfs2_super *osb = OCFS2_SB(dentry->d_sb);
 
@@ -2614,7 +2613,7 @@ static void ocfs2_unlock_ast(void *opaque, enum dlm_status status)
 		lockres->l_action = OCFS2_AST_INVALID;
 		break;
 	case OCFS2_UNLOCK_DROP_LOCK:
-		lockres->l_level = LKM_IVMODE;
+		lockres->l_level = DLM_LOCK_IV;
 		break;
 	default:
 		BUG();
@@ -2635,14 +2634,14 @@ static int ocfs2_drop_lock(struct ocfs2_super *osb,
 {
 	enum dlm_status status;
 	unsigned long flags;
-	int lkm_flags = 0;
+	u32 lkm_flags = 0;
 
 	/* We didn't get anywhere near actually using this lockres. */
 	if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED))
 		goto out;
 
 	if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
-		lkm_flags |= LKM_VALBLK;
+		lkm_flags |= DLM_LKF_VALBLK;
 
 	spin_lock_irqsave(&lockres->l_lock, flags);
 
@@ -2668,7 +2667,7 @@ static int ocfs2_drop_lock(struct ocfs2_super *osb,
 
 	if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB) {
 		if (lockres->l_flags & OCFS2_LOCK_ATTACHED &&
-		    lockres->l_level == LKM_EXMODE &&
+		    lockres->l_level == DLM_LOCK_EX &&
 		    !(lockres->l_flags & OCFS2_LOCK_NEEDS_REFRESH))
 			lockres->l_ops->set_lvb(lockres);
 	}
@@ -2801,10 +2800,10 @@ static void ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
 {
 	assert_spin_locked(&lockres->l_lock);
 
-	BUG_ON(lockres->l_blocking <= LKM_NLMODE);
+	BUG_ON(lockres->l_blocking <= DLM_LOCK_NL);
 
 	if (lockres->l_level <= new_level) {
-		mlog(ML_ERROR, "lockres->l_level (%u) <= new_level (%u)\n",
+		mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n",
 		     lockres->l_level, new_level);
 		BUG();
 	}
@@ -2822,13 +2821,14 @@ static int ocfs2_downconvert_lock(struct ocfs2_super *osb,
 				  int new_level,
 				  int lvb)
 {
-	int ret, dlm_flags = LKM_CONVERT;
+	int ret;
+	u32 dlm_flags = DLM_LKF_CONVERT;
 	enum dlm_status status;
 
 	mlog_entry_void();
 
 	if (lvb)
-		dlm_flags |= LKM_VALBLK;
+		dlm_flags |= DLM_LKF_VALBLK;
 
 	status = ocfs2_dlm_lock(osb->dlm,
 				new_level,
@@ -2894,7 +2894,7 @@ static int ocfs2_cancel_convert(struct ocfs2_super *osb,
 	ret = 0;
 	status = ocfs2_dlm_unlock(osb->dlm,
 				  &lockres->l_lksb,
-				  LKM_CANCEL,
+				  DLM_LKF_CANCEL,
 				  lockres);
 	if (status != DLM_NORMAL) {
 		ocfs2_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
@@ -2939,13 +2939,13 @@ recheck:
 
 	/* if we're blocking an exclusive and we have *any* holders,
 	 * then requeue. */
-	if ((lockres->l_blocking == LKM_EXMODE)
+	if ((lockres->l_blocking == DLM_LOCK_EX)
 	    && (lockres->l_ex_holders || lockres->l_ro_holders))
 		goto leave_requeue;
 
 	/* If it's a PR we're blocking, then only
 	 * requeue if we've got any EX holders */
-	if (lockres->l_blocking == LKM_PRMODE &&
+	if (lockres->l_blocking == DLM_LOCK_PR &&
 	    lockres->l_ex_holders)
 		goto leave_requeue;
 
@@ -2992,7 +2992,7 @@ downconvert:
 	ctl->requeue = 0;
 
 	if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB) {
-		if (lockres->l_level == LKM_EXMODE)
+		if (lockres->l_level == DLM_LOCK_EX)
 			set_lvb = 1;
 
 		/*
@@ -3046,7 +3046,7 @@ static int ocfs2_data_convert_worker(struct ocfs2_lock_res *lockres,
 		     (unsigned long long)OCFS2_I(inode)->ip_blkno);
 	}
 	sync_mapping_buffers(mapping);
-	if (blocking == LKM_EXMODE) {
+	if (blocking == DLM_LOCK_EX) {
 		truncate_inode_pages(mapping, 0);
 	} else {
 		/* We only need to wait on the I/O if we're not also
@@ -3067,8 +3067,8 @@ static int ocfs2_check_meta_downconvert(struct ocfs2_lock_res *lockres,
 	struct inode *inode = ocfs2_lock_res_inode(lockres);
 	int checkpointed = ocfs2_inode_fully_checkpointed(inode);
 
-	BUG_ON(new_level != LKM_NLMODE && new_level != LKM_PRMODE);
-	BUG_ON(lockres->l_level != LKM_EXMODE && !checkpointed);
+	BUG_ON(new_level != DLM_LOCK_NL && new_level != DLM_LOCK_PR);
+	BUG_ON(lockres->l_level != DLM_LOCK_EX && !checkpointed);
 
 	if (checkpointed)
 		return 1;
@@ -3132,7 +3132,7 @@ static int ocfs2_dentry_convert_worker(struct ocfs2_lock_res *lockres,
 	 * valid. The downconvert code will retain a PR for this node,
 	 * so there's no further work to do.
 	 */
-	if (blocking == LKM_PRMODE)
+	if (blocking == DLM_LOCK_PR)
 		return UNBLOCK_CONTINUE;
 
 	/*
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 4f44f23..9953804 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -18,15 +18,65 @@
  * General Public License for more details.
  */
 
-#include <linux/types.h>
-#include <linux/list.h>
-
-#include "dlm/dlmapi.h"
-
 #include "stackglue.h"
 
 static struct ocfs2_locking_protocol *lproto;
 
+/* These should be identical */
+#if (DLM_LOCK_IV != LKM_IVMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_NL != LKM_NLMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_CR != LKM_CRMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_CW != LKM_CWMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_PR != LKM_PRMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_PW != LKM_PWMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_EX != LKM_EXMODE)
+# error Lock modes do not match
+#endif
+static inline int mode_to_o2dlm(int mode)
+{
+	BUG_ON(mode > LKM_MAXMODE);
+
+	return mode;
+}
+
+#define map_flag(_generic, _o2dlm)		\
+	if (flags & (_generic)) {		\
+		flags &= ~(_generic);		\
+		o2dlm_flags |= (_o2dlm);	\
+	}
+static int flags_to_o2dlm(u32 flags)
+{
+	int o2dlm_flags = 0;
+
+	map_flag(DLM_LKF_NOQUEUE, LKM_NOQUEUE);
+	map_flag(DLM_LKF_CANCEL, LKM_CANCEL);
+	map_flag(DLM_LKF_CONVERT, LKM_CONVERT);
+	map_flag(DLM_LKF_VALBLK, LKM_VALBLK);
+	map_flag(DLM_LKF_IVVALBLK, LKM_INVVALBLK);
+	map_flag(DLM_LKF_ORPHAN, LKM_ORPHAN);
+	map_flag(DLM_LKF_FORCEUNLOCK, LKM_FORCE);
+	map_flag(DLM_LKF_TIMEOUT, LKM_TIMEOUT);
+	map_flag(DLM_LKF_LOCAL, LKM_LOCAL);
+
+	/* map_flag() should have cleared every flag passed in */
+	BUG_ON(flags != 0);
+
+	return o2dlm_flags;
+}
+#undef map_flag
+
 enum dlm_status ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 		   int mode,
 		   struct dlm_lockstatus *lksb,
@@ -35,8 +85,12 @@ enum dlm_status ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 		   unsigned int namelen,
 		   void *astarg)
 {
+	int o2dlm_mode = mode_to_o2dlm(mode);
+	int o2dlm_flags = flags_to_o2dlm(flags);
+
 	BUG_ON(lproto == NULL);
-	return dlmlock(dlm, mode, lksb, flags, name, namelen,
+
+	return dlmlock(dlm, o2dlm_mode, lksb, o2dlm_flags, name, namelen,
 		       lproto->lp_lock_ast, astarg,
 		       lproto->lp_blocking_ast);
 }
@@ -46,9 +100,12 @@ enum dlm_status ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
 		     u32 flags,
 		     void *astarg)
 {
+	int o2dlm_flags = flags_to_o2dlm(flags);
+
 	BUG_ON(lproto == NULL);
 
-	return dlmunlock(dlm, lksb, flags, lproto->lp_unlock_ast, astarg);
+	return dlmunlock(dlm, lksb, o2dlm_flags,
+			 lproto->lp_unlock_ast, astarg);
 }
 
 
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 40a0024..986d059 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -21,6 +21,19 @@
 #ifndef STACKGLUE_H
 #define STACKGLUE_H
 
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/dlmconstants.h>
+
+/*
+ * dlmconstants.h does not have a LOCAL flag.  We hope to remove it
+ * some day, but right now we need it.  Let's fake it.  This value is larger
+ * than any flag in dlmconstants.h.
+ */
+#define DLM_LKF_LOCAL		0x00100000
+
+#include "dlm/dlmapi.h"
+
 struct ocfs2_locking_protocol {
 	void (*lp_lock_ast)(void *astarg);
 	void (*lp_blocking_ast)(void *astarg, int level);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 10/62] ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API.
  2008-04-02 20:14                 ` [Ocfs2-devel] [PATCH 09/62] ocfs2: Use global DLM_ constants in generic code Mark Fasheh
@ 2008-04-02 20:14                   ` Mark Fasheh
  2008-04-02 20:14                     ` [Ocfs2-devel] [PATCH 11/62] ocfs2: Create the lock status block union Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
This is the first step towards elminiating dlm_status in
fs/ocfs2/dlmglue.c.  The change also passes -errno values to
->unlock_ast().

[ Fix a return code in dlmglue.c and change the error translation table into
  an array of ints. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |  116 ++++++++++++++++++-----------------------
 fs/ocfs2/stackglue.c |  142 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/ocfs2/stackglue.h |    6 +-
 3 files changed, 188 insertions(+), 76 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index b8ac903..6a222a5 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -329,10 +329,9 @@ static void ocfs2_schedule_blocked_lock(struct ocfs2_super *osb,
 					struct ocfs2_lock_res *lockres);
 static inline void ocfs2_recover_from_dlm_error(struct ocfs2_lock_res *lockres,
 						int convert);
-#define ocfs2_log_dlm_error(_func, _stat, _lockres) do {	\
-	mlog(ML_ERROR, "Dlm error \"%s\" while calling %s on "	\
-		"resource %s: %s\n", dlm_errname(_stat), _func,	\
-		_lockres->l_name, dlm_errmsg(_stat));		\
+#define ocfs2_log_dlm_error(_func, _err, _lockres) do {			\
+	mlog(ML_ERROR, "DLM error %d while calling %s on resource %s\n", \
+	     _err, _func, _lockres->l_name);				\
 } while (0)
 static int ocfs2_downconvert_thread(void *arg);
 static void ocfs2_downconvert_on_unlock(struct ocfs2_super *osb,
@@ -867,7 +866,6 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 			     u32 dlm_flags)
 {
 	int ret = 0;
-	enum dlm_status status = DLM_NORMAL;
 	unsigned long flags;
 
 	mlog_entry_void();
@@ -887,21 +885,19 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 	lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
-	status = ocfs2_dlm_lock(osb->dlm,
-				level,
-				&lockres->l_lksb,
-				dlm_flags,
-				lockres->l_name,
-				OCFS2_LOCK_ID_MAX_LEN - 1,
-				lockres);
-	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("ocfs2_dlm_lock", status, lockres);
-		ret = -EINVAL;
+	ret = ocfs2_dlm_lock(osb->dlm,
+			     level,
+			     &lockres->l_lksb,
+			     dlm_flags,
+			     lockres->l_name,
+			     OCFS2_LOCK_ID_MAX_LEN - 1,
+			     lockres);
+	if (ret) {
+		ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres);
 		ocfs2_recover_from_dlm_error(lockres, 1);
 	}
 
-	mlog(0, "lock %s, successfull return from ocfs2_dlm_lock\n",
-	     lockres->l_name);
+	mlog(0, "lock %s, return from ocfs2_dlm_lock\n", lockres->l_name);
 
 bail:
 	mlog_exit(ret);
@@ -1018,7 +1014,6 @@ static int ocfs2_cluster_lock(struct ocfs2_super *osb,
 			      int arg_flags)
 {
 	struct ocfs2_mask_waiter mw;
-	enum dlm_status status;
 	int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR);
 	int ret = 0; /* gcc doesn't realize wait = 1 guarantees ret is set */
 	unsigned long flags;
@@ -1089,21 +1084,18 @@ again:
 		     lockres->l_name, lockres->l_level, level);
 
 		/* call dlm_lock to upgrade lock now */
-		status = ocfs2_dlm_lock(osb->dlm,
-					level,
-					&lockres->l_lksb,
-					lkm_flags,
-					lockres->l_name,
-					OCFS2_LOCK_ID_MAX_LEN - 1,
-					lockres);
-		if (status != DLM_NORMAL) {
-			if ((lkm_flags & DLM_LKF_NOQUEUE) &&
-			    (status == DLM_NOTQUEUED))
-				ret = -EAGAIN;
-			else {
+		ret = ocfs2_dlm_lock(osb->dlm,
+				     level,
+				     &lockres->l_lksb,
+				     lkm_flags,
+				     lockres->l_name,
+				     OCFS2_LOCK_ID_MAX_LEN - 1,
+				     lockres);
+		if (ret) {
+			if (!(lkm_flags & DLM_LKF_NOQUEUE) ||
+			    (ret != -EAGAIN)) {
 				ocfs2_log_dlm_error("ocfs2_dlm_lock",
-						    status, lockres);
-				ret = -EINVAL;
+						    ret, lockres);
 			}
 			ocfs2_recover_from_dlm_error(lockres, 1);
 			goto out;
@@ -1502,10 +1494,8 @@ int ocfs2_file_lock(struct file *file, int ex, int trylock)
 	ret = ocfs2_dlm_lock(osb->dlm, level, &lockres->l_lksb, lkm_flags,
 			     lockres->l_name, OCFS2_LOCK_ID_MAX_LEN - 1,
 			     lockres);
-	if (ret != DLM_NORMAL) {
-		if (trylock && ret == DLM_NOTQUEUED)
-			ret = -EAGAIN;
-		else {
+	if (ret) {
+		if (!trylock || (ret != -EAGAIN)) {
 			ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres);
 			ret = -EINVAL;
 		}
@@ -2573,7 +2563,7 @@ void ocfs2_dlm_shutdown(struct ocfs2_super *osb)
 	mlog_exit_void();
 }
 
-static void ocfs2_unlock_ast(void *opaque, enum dlm_status status)
+static void ocfs2_unlock_ast(void *opaque, int error)
 {
 	struct ocfs2_lock_res *lockres = opaque;
 	unsigned long flags;
@@ -2589,7 +2579,7 @@ static void ocfs2_unlock_ast(void *opaque, enum dlm_status status)
 	 * state. The wake_up call done at the bottom is redundant
 	 * (ocfs2_prepare_cancel_convert doesn't sleep on this) but doesn't
 	 * hurt anything anyway */
-	if (status == DLM_CANCELGRANT &&
+	if (error == -DLM_ECANCEL &&
 	    lockres->l_unlock_action == OCFS2_UNLOCK_CANCEL_CONVERT) {
 		mlog(0, "Got cancelgrant for %s\n", lockres->l_name);
 
@@ -2599,9 +2589,10 @@ static void ocfs2_unlock_ast(void *opaque, enum dlm_status status)
 		goto complete_unlock;
 	}
 
-	if (status != DLM_NORMAL) {
-		mlog(ML_ERROR, "Dlm passes status %d for lock %s, "
-		     "unlock_action %d\n", status, lockres->l_name,
+	/* DLM_EUNLOCK is the success code for unlock */
+	if (error != -DLM_EUNLOCK) {
+		mlog(ML_ERROR, "Dlm passes error %d for lock %s, "
+		     "unlock_action %d\n", error, lockres->l_name,
 		     lockres->l_unlock_action);
 		spin_unlock_irqrestore(&lockres->l_lock, flags);
 		return;
@@ -2632,7 +2623,7 @@ complete_unlock:
 static int ocfs2_drop_lock(struct ocfs2_super *osb,
 			   struct ocfs2_lock_res *lockres)
 {
-	enum dlm_status status;
+	int ret;
 	unsigned long flags;
 	u32 lkm_flags = 0;
 
@@ -2696,10 +2687,10 @@ static int ocfs2_drop_lock(struct ocfs2_super *osb,
 
 	mlog(0, "lock %s\n", lockres->l_name);
 
-	status = ocfs2_dlm_unlock(osb->dlm, &lockres->l_lksb, lkm_flags,
-				  lockres);
-	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
+	ret = ocfs2_dlm_unlock(osb->dlm, &lockres->l_lksb, lkm_flags,
+			       lockres);
+	if (ret) {
+		ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres);
 		mlog(ML_ERROR, "lockres flags: %lu\n", lockres->l_flags);
 		dlm_print_one_lock(lockres->l_lksb.lockid);
 		BUG();
@@ -2823,23 +2814,21 @@ static int ocfs2_downconvert_lock(struct ocfs2_super *osb,
 {
 	int ret;
 	u32 dlm_flags = DLM_LKF_CONVERT;
-	enum dlm_status status;
 
 	mlog_entry_void();
 
 	if (lvb)
 		dlm_flags |= DLM_LKF_VALBLK;
 
-	status = ocfs2_dlm_lock(osb->dlm,
-				new_level,
-				&lockres->l_lksb,
-				dlm_flags,
-				lockres->l_name,
-				OCFS2_LOCK_ID_MAX_LEN - 1,
-				lockres);
-	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("ocfs2_dlm_lock", status, lockres);
-		ret = -EINVAL;
+	ret = ocfs2_dlm_lock(osb->dlm,
+			     new_level,
+			     &lockres->l_lksb,
+			     dlm_flags,
+			     lockres->l_name,
+			     OCFS2_LOCK_ID_MAX_LEN - 1,
+			     lockres);
+	if (ret) {
+		ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres);
 		ocfs2_recover_from_dlm_error(lockres, 1);
 		goto bail;
 	}
@@ -2886,19 +2875,14 @@ static int ocfs2_cancel_convert(struct ocfs2_super *osb,
 				struct ocfs2_lock_res *lockres)
 {
 	int ret;
-	enum dlm_status status;
 
 	mlog_entry_void();
 	mlog(0, "lock %s\n", lockres->l_name);
 
-	ret = 0;
-	status = ocfs2_dlm_unlock(osb->dlm,
-				  &lockres->l_lksb,
-				  DLM_LKF_CANCEL,
-				  lockres);
-	if (status != DLM_NORMAL) {
-		ocfs2_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
-		ret = -EINVAL;
+	ret = ocfs2_dlm_unlock(osb->dlm, &lockres->l_lksb,
+			       DLM_LKF_CANCEL, lockres);
+	if (ret) {
+		ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres);
 		ocfs2_recover_from_dlm_error(lockres, 0);
 	}
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 9953804..0aec2fc 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -18,6 +18,7 @@
  * General Public License for more details.
  */
 
+#include "cluster/masklog.h"
 #include "stackglue.h"
 
 static struct ocfs2_locking_protocol *lproto;
@@ -77,7 +78,126 @@ static int flags_to_o2dlm(u32 flags)
 }
 #undef map_flag
 
-enum dlm_status ocfs2_dlm_lock(struct dlm_ctxt *dlm,
+/*
+ * Map an o2dlm status to standard errno values.
+ *
+ * o2dlm only uses a handful of these, and returns even fewer to the
+ * caller. Still, we try to assign sane values to each error.
+ *
+ * The following value pairs have special meanings to dlmglue, thus
+ * the right hand side needs to stay unique - never duplicate the
+ * mapping elsewhere in the table!
+ *
+ * DLM_NORMAL:		0
+ * DLM_NOTQUEUED:	-EAGAIN
+ * DLM_CANCELGRANT:	-DLM_ECANCEL
+ * DLM_CANCEL:		-DLM_EUNLOCK
+ */
+/* Keep in sync with dlmapi.h */
+static int status_map[] = {
+	[DLM_NORMAL]			= 0,		/* Success */
+	[DLM_GRANTED]			= -EINVAL,
+	[DLM_DENIED]			= -EACCES,
+	[DLM_DENIED_NOLOCKS]		= -EACCES,
+	[DLM_WORKING]			= -EBUSY,
+	[DLM_BLOCKED]			= -EINVAL,
+	[DLM_BLOCKED_ORPHAN]		= -EINVAL,
+	[DLM_DENIED_GRACE_PERIOD]	= -EACCES,
+	[DLM_SYSERR]			= -ENOMEM,	/* It is what it is */
+	[DLM_NOSUPPORT]			= -EPROTO,
+	[DLM_CANCELGRANT]		= -DLM_ECANCEL, /* Cancel after grant */
+	[DLM_IVLOCKID]			= -EINVAL,
+	[DLM_SYNC]			= -EINVAL,
+	[DLM_BADTYPE]			= -EINVAL,
+	[DLM_BADRESOURCE]		= -EINVAL,
+	[DLM_MAXHANDLES]		= -ENOMEM,
+	[DLM_NOCLINFO]			= -EINVAL,
+	[DLM_NOLOCKMGR]			= -EINVAL,
+	[DLM_NOPURGED]			= -EINVAL,
+	[DLM_BADARGS]			= -EINVAL,
+	[DLM_VOID]			= -EINVAL,
+	[DLM_NOTQUEUED]			= -EAGAIN,	/* Trylock failed */
+	[DLM_IVBUFLEN]			= -EINVAL,
+	[DLM_CVTUNGRANT]		= -EPERM,
+	[DLM_BADPARAM]			= -EINVAL,
+	[DLM_VALNOTVALID]		= -EINVAL,
+	[DLM_REJECTED]			= -EPERM,
+	[DLM_ABORT]			= -EINVAL,
+	[DLM_CANCEL]			= -DLM_EUNLOCK,	/* Successful cancel */
+	[DLM_IVRESHANDLE]		= -EINVAL,
+	[DLM_DEADLOCK]			= -EDEADLK,
+	[DLM_DENIED_NOASTS]		= -EINVAL,
+	[DLM_FORWARD]			= -EINVAL,
+	[DLM_TIMEOUT]			= -ETIMEDOUT,
+	[DLM_IVGROUPID]			= -EINVAL,
+	[DLM_VERS_CONFLICT]		= -EOPNOTSUPP,
+	[DLM_BAD_DEVICE_PATH]		= -ENOENT,
+	[DLM_NO_DEVICE_PERMISSION]	= -EPERM,
+	[DLM_NO_CONTROL_DEVICE]		= -ENOENT,
+	[DLM_RECOVERING]		= -ENOTCONN,
+	[DLM_MIGRATING]			= -ERESTART,
+	[DLM_MAXSTATS]			= -EINVAL,
+};
+static int dlm_status_to_errno(enum dlm_status status)
+{
+	BUG_ON(status > (sizeof(status_map) / sizeof(status_map[0])));
+
+	return status_map[status];
+}
+
+static void o2dlm_lock_ast_wrapper(void *astarg)
+{
+	BUG_ON(lproto == NULL);
+
+	lproto->lp_lock_ast(astarg);
+}
+
+static void o2dlm_blocking_ast_wrapper(void *astarg, int level)
+{
+	BUG_ON(lproto == NULL);
+
+	lproto->lp_blocking_ast(astarg, level);
+}
+
+static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
+{
+	int error;
+
+	BUG_ON(lproto == NULL);
+
+	/*
+	 * XXX: CANCEL values are sketchy.
+	 *
+	 * Currently we have preserved the o2dlm paradigm.  You can get
+	 * unlock_ast() whether the cancel succeded or not.
+	 *
+	 * First, we're going to pass DLM_EUNLOCK just like fs/dlm does for
+	 * successful unlocks.  That is a clean behavior.
+	 *
+	 * In o2dlm, you can get both the lock_ast() for the lock being
+	 * granted and the unlock_ast() for the CANCEL failing.  A
+	 * successful cancel sends DLM_NORMAL here.  If the
+	 * lock grant happened before the cancel arrived, you get
+	 * DLM_CANCELGRANT.  For now, we'll use DLM_ECANCEL to signify
+	 * CANCELGRANT - the CANCEL was supposed to happen but didn't.  We
+	 * can then use DLM_EUNLOCK to signify a successful CANCEL -
+	 * effectively, the CANCEL caused the lock to roll back.
+	 *
+	 * In the future, we will likely move the o2dlm to send only one
+	 * ast - either unlock_ast() for a successful CANCEL or lock_ast()
+	 * when the grant succeeds.  At that point, we'll send DLM_ECANCEL
+	 * for all cancel results (CANCELGRANT will no longer exist).
+	 */
+	error = dlm_status_to_errno(status);
+
+	/* Successful unlock is DLM_EUNLOCK */
+	if (!error)
+		error = -DLM_EUNLOCK;
+
+	lproto->lp_unlock_ast(astarg, error);
+}
+
+int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 		   int mode,
 		   struct dlm_lockstatus *lksb,
 		   u32 flags,
@@ -85,27 +205,35 @@ enum dlm_status ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 		   unsigned int namelen,
 		   void *astarg)
 {
+	enum dlm_status status;
 	int o2dlm_mode = mode_to_o2dlm(mode);
 	int o2dlm_flags = flags_to_o2dlm(flags);
+	int ret;
 
 	BUG_ON(lproto == NULL);
 
-	return dlmlock(dlm, o2dlm_mode, lksb, o2dlm_flags, name, namelen,
-		       lproto->lp_lock_ast, astarg,
-		       lproto->lp_blocking_ast);
+	status = dlmlock(dlm, o2dlm_mode, lksb, o2dlm_flags, name, namelen,
+		       o2dlm_lock_ast_wrapper, astarg,
+		       o2dlm_blocking_ast_wrapper);
+	ret = dlm_status_to_errno(status);
+	return ret;
 }
 
-enum dlm_status ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
+int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
 		     struct dlm_lockstatus *lksb,
 		     u32 flags,
 		     void *astarg)
 {
+	enum dlm_status status;
 	int o2dlm_flags = flags_to_o2dlm(flags);
+	int ret;
 
 	BUG_ON(lproto == NULL);
 
-	return dlmunlock(dlm, lksb, o2dlm_flags,
-			 lproto->lp_unlock_ast, astarg);
+	status = dlmunlock(dlm, lksb, o2dlm_flags,
+			 o2dlm_unlock_ast_wrapper, astarg);
+	ret = dlm_status_to_errno(status);
+	return ret;
 }
 
 
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 986d059..8ebcfba 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -37,17 +37,17 @@
 struct ocfs2_locking_protocol {
 	void (*lp_lock_ast)(void *astarg);
 	void (*lp_blocking_ast)(void *astarg, int level);
-	void (*lp_unlock_ast)(void *astarg, enum dlm_status status);
+	void (*lp_unlock_ast)(void *astarg, int error);
 };
 
-enum dlm_status ocfs2_dlm_lock(struct dlm_ctxt *dlm,
+int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 		   int mode,
 		   struct dlm_lockstatus *lksb,
 		   u32 flags,
 		   void *name,
 		   unsigned int namelen,
 		   void *astarg);
-enum dlm_status ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
+int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
 		     struct dlm_lockstatus *lksb,
 		     u32 flags,
 		     void *astarg);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 11/62] ocfs2: Create the lock status block union.
  2008-04-02 20:14                   ` [Ocfs2-devel] [PATCH 10/62] ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API Mark Fasheh
@ 2008-04-02 20:14                     ` Mark Fasheh
  2008-04-02 20:14                       ` [Ocfs2-devel] [PATCH 12/62] ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Wrap the lock status block (lksb) in a union.  Later we will add a union
element for the fs/dlm lksb.  Create accessors for the status and lvb
fields.

Other than a debugging function, dlmglue.c does not directly reference
the o2dlm locking path anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |   23 +++++++++++++----------
 fs/ocfs2/ocfs2.h     |    5 +++--
 fs/ocfs2/stackglue.c |   29 ++++++++++++++++++++++-------
 fs/ocfs2/stackglue.h |   11 +++++++++--
 4 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 6a222a5..4590376 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -112,7 +112,8 @@ static void ocfs2_dump_meta_lvb_info(u64 level,
 				     unsigned int line,
 				     struct ocfs2_lock_res *lockres)
 {
-	struct ocfs2_meta_lvb *lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb;
+	struct ocfs2_meta_lvb *lvb =
+		(struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb);
 
 	mlog(level, "LVB information for %s (called from %s:%u):\n",
 	     lockres->l_name, function, line);
@@ -799,14 +800,14 @@ static void ocfs2_blocking_ast(void *opaque, int level)
 static void ocfs2_locking_ast(void *opaque)
 {
 	struct ocfs2_lock_res *lockres = opaque;
-	struct dlm_lockstatus *lksb = &lockres->l_lksb;
 	unsigned long flags;
 
 	spin_lock_irqsave(&lockres->l_lock, flags);
 
-	if (lksb->status != DLM_NORMAL) {
-		mlog(ML_ERROR, "lockres %s: lksb status value of %u!\n",
-		     lockres->l_name, lksb->status);
+	if (ocfs2_dlm_lock_status(&lockres->l_lksb)) {
+		mlog(ML_ERROR, "lockres %s: lksb status value of %d!\n",
+		     lockres->l_name,
+		     ocfs2_dlm_lock_status(&lockres->l_lksb));
 		spin_unlock_irqrestore(&lockres->l_lock, flags);
 		return;
 	}
@@ -1634,7 +1635,7 @@ static void __ocfs2_stuff_meta_lvb(struct inode *inode)
 
 	mlog_entry_void();
 
-	lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb;
+	lvb = (struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb);
 
 	/*
 	 * Invalidate the LVB of a deleted inode - this way other
@@ -1686,7 +1687,7 @@ static void ocfs2_refresh_inode_from_lvb(struct inode *inode)
 
 	mlog_meta_lvb(0, lockres);
 
-	lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb;
+	lvb = (struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb);
 
 	/* We're safe here without the lockres lock... */
 	spin_lock(&oi->ip_lock);
@@ -1721,7 +1722,8 @@ static void ocfs2_refresh_inode_from_lvb(struct inode *inode)
 static inline int ocfs2_meta_lvb_is_trustable(struct inode *inode,
 					      struct ocfs2_lock_res *lockres)
 {
-	struct ocfs2_meta_lvb *lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb;
+	struct ocfs2_meta_lvb *lvb =
+		(struct ocfs2_meta_lvb *)ocfs2_dlm_lvb(&lockres->l_lksb);
 
 	if (lvb->lvb_version == OCFS2_LVB_VERSION
 	    && be32_to_cpu(lvb->lvb_igeneration) == inode->i_generation)
@@ -2379,7 +2381,7 @@ static int ocfs2_dlm_seq_show(struct seq_file *m, void *v)
 		   lockres->l_blocking);
 
 	/* Dump the raw LVB */
-	lvb = lockres->l_lksb.lvb;
+	lvb = ocfs2_dlm_lvb(&lockres->l_lksb);
 	for(i = 0; i < DLM_LVB_LEN; i++)
 		seq_printf(m, "0x%x\t", lvb[i]);
 
@@ -2692,7 +2694,8 @@ static int ocfs2_drop_lock(struct ocfs2_super *osb,
 	if (ret) {
 		ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres);
 		mlog(ML_ERROR, "lockres flags: %lu\n", lockres->l_flags);
-		dlm_print_one_lock(lockres->l_lksb.lockid);
+		/* XXX Need to abstract this */
+		dlm_print_one_lock(lockres->l_lksb.lksb_o2dlm.lockid);
 		BUG();
 	}
 	mlog(0, "lock %s, successfull return from ocfs2_dlm_unlock\n",
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index f78e9ed..6d7c6d2 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -40,7 +40,8 @@
 #include "cluster/heartbeat.h"
 #include "cluster/tcp.h"
 
-#include "dlm/dlmapi.h"
+/* For union ocfs2_dlm_lksb */
+#include "stackglue.h"
 
 #include "ocfs2_fs.h"
 #include "ocfs2_lockid.h"
@@ -120,7 +121,7 @@ struct ocfs2_lock_res {
 	int                      l_level;
 	unsigned int             l_ro_holders;
 	unsigned int             l_ex_holders;
-	struct dlm_lockstatus    l_lksb;
+	union ocfs2_dlm_lksb     l_lksb;
 
 	/* used from AST/BAST funcs. */
 	enum ocfs2_ast_action    l_action;
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 0aec2fc..eb88854 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -199,7 +199,7 @@ static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
 
 int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 		   int mode,
-		   struct dlm_lockstatus *lksb,
+		   union ocfs2_dlm_lksb *lksb,
 		   u32 flags,
 		   void *name,
 		   unsigned int namelen,
@@ -212,15 +212,16 @@ int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 
 	BUG_ON(lproto == NULL);
 
-	status = dlmlock(dlm, o2dlm_mode, lksb, o2dlm_flags, name, namelen,
-		       o2dlm_lock_ast_wrapper, astarg,
-		       o2dlm_blocking_ast_wrapper);
+	status = dlmlock(dlm, o2dlm_mode, &lksb->lksb_o2dlm, o2dlm_flags,
+			 name, namelen,
+			 o2dlm_lock_ast_wrapper, astarg,
+			 o2dlm_blocking_ast_wrapper);
 	ret = dlm_status_to_errno(status);
 	return ret;
 }
 
 int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
-		     struct dlm_lockstatus *lksb,
+		     union ocfs2_dlm_lksb *lksb,
 		     u32 flags,
 		     void *astarg)
 {
@@ -230,12 +231,26 @@ int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
 
 	BUG_ON(lproto == NULL);
 
-	status = dlmunlock(dlm, lksb, o2dlm_flags,
-			 o2dlm_unlock_ast_wrapper, astarg);
+	status = dlmunlock(dlm, &lksb->lksb_o2dlm, o2dlm_flags,
+			   o2dlm_unlock_ast_wrapper, astarg);
 	ret = dlm_status_to_errno(status);
 	return ret;
 }
 
+int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
+{
+	return dlm_status_to_errno(lksb->lksb_o2dlm.status);
+}
+
+/*
+ * Why don't we cast to ocfs2_meta_lvb?  The "clean" answer is that we
+ * don't cast at the glue level.  The real answer is that the header
+ * ordering is nigh impossible.
+ */
+void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb)
+{
+	return (void *)(lksb->lksb_o2dlm.lvb);
+}
 
 void o2cb_get_stack(struct ocfs2_locking_protocol *proto)
 {
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 8ebcfba..3c91e24 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -40,18 +40,25 @@ struct ocfs2_locking_protocol {
 	void (*lp_unlock_ast)(void *astarg, int error);
 };
 
+union ocfs2_dlm_lksb {
+	struct dlm_lockstatus lksb_o2dlm;
+};
+
 int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 		   int mode,
-		   struct dlm_lockstatus *lksb,
+		   union ocfs2_dlm_lksb *lksb,
 		   u32 flags,
 		   void *name,
 		   unsigned int namelen,
 		   void *astarg);
 int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
-		     struct dlm_lockstatus *lksb,
+		     union ocfs2_dlm_lksb *lksb,
 		     u32 flags,
 		     void *astarg);
 
+int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb);
+void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb);
+
 void o2cb_get_stack(struct ocfs2_locking_protocol *proto);
 void o2cb_put_stack(void);
 
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 12/62] ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.
  2008-04-02 20:14                     ` [Ocfs2-devel] [PATCH 11/62] ocfs2: Create the lock status block union Mark Fasheh
@ 2008-04-02 20:14                       ` Mark Fasheh
  2008-04-02 20:14                         ` [Ocfs2-devel] [PATCH 13/62] ocfs2: Abstract out node number queries Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

This step introduces a cluster stack agnostic API for initializing and
exiting.  fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
connect to the stack.  It is all handled in stackglue.c.

heartbeat.c no longer needs to know how it gets called.
ocfs2_do_node_down() is now a clean recovery trigger.

The big gotcha is the ordering of initializations and de-initializations done
underneath ocfs2_cluster_connect().  ocfs2_dlm_init() used to do all
o2dlm initialization in one block.  Thus, the o2dlm functionality of
ocfs2_cluster_connect() is very straightforward.  ocfs2_dlm_shutdown(),
however, did a few things between de-registration of the eviction
callback and actually shutting down the domain.  Now de-registration and
shutdown of the domain are wrapped within the single
ocfs2_cluster_disconnect() call.  I've checked the code paths to make
sure we can safely tear down things in ocfs2_dlm_shutdown() before
calling ocfs2_cluster_disconnect().  The filesystem has already set
itself to ignore the callback.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |   97 ++++++++++++++++++-------------------
 fs/ocfs2/dlmglue.h   |    1 -
 fs/ocfs2/heartbeat.c |   40 +++------------
 fs/ocfs2/heartbeat.h |    2 +-
 fs/ocfs2/ocfs2.h     |    4 +-
 fs/ocfs2/stackglue.c |  131 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/ocfs2/stackglue.h |   35 +++++++++++++-
 fs/ocfs2/super.c     |   13 ++---
 8 files changed, 221 insertions(+), 102 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4590376..6652a48 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -27,7 +27,6 @@
 #include <linux/slab.h>
 #include <linux/highmem.h>
 #include <linux/mm.h>
-#include <linux/crc32.h>
 #include <linux/kthread.h>
 #include <linux/pagemap.h>
 #include <linux/debugfs.h>
@@ -259,31 +258,6 @@ static struct ocfs2_lock_res_ops ocfs2_flock_lops = {
 	.flags		= 0,
 };
 
-/*
- * This is the filesystem locking protocol version.
- *
- * Whenever the filesystem does new things with locks (adds or removes a
- * lock, orders them differently, does different things underneath a lock),
- * the version must be changed.  The protocol is negotiated when joining
- * the dlm domain.  A node may join the domain if its major version is
- * identical to all other nodes and its minor version is greater than
- * or equal to all other nodes.  When its minor version is greater than
- * the other nodes, it will run at the minor version specified by the
- * other nodes.
- *
- * If a locking change is made that will not be compatible with older
- * versions, the major number must be increased and the minor version set
- * to zero.  If a change merely adds a behavior that can be disabled when
- * speaking to older versions, the minor version must be increased.  If a
- * change adds a fully backwards compatible change (eg, LVB changes that
- * are just ignored by older versions), the version does not need to be
- * updated.
- */
-const struct dlm_protocol_version ocfs2_locking_protocol = {
-	.pv_major = OCFS2_LOCKING_PROTOCOL_MAJOR,
-	.pv_minor = OCFS2_LOCKING_PROTOCOL_MINOR,
-};
-
 static inline int ocfs2_is_inode_lock(struct ocfs2_lock_res *lockres)
 {
 	return lockres->l_type == OCFS2_LOCK_TYPE_META ||
@@ -886,7 +860,7 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 	lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
-	ret = ocfs2_dlm_lock(osb->dlm,
+	ret = ocfs2_dlm_lock(osb->cconn,
 			     level,
 			     &lockres->l_lksb,
 			     dlm_flags,
@@ -1085,7 +1059,7 @@ again:
 		     lockres->l_name, lockres->l_level, level);
 
 		/* call dlm_lock to upgrade lock now */
-		ret = ocfs2_dlm_lock(osb->dlm,
+		ret = ocfs2_dlm_lock(osb->cconn,
 				     level,
 				     &lockres->l_lksb,
 				     lkm_flags,
@@ -1492,7 +1466,7 @@ int ocfs2_file_lock(struct file *file, int ex, int trylock)
 	lockres_add_mask_waiter(lockres, &mw, OCFS2_LOCK_BUSY, 0);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
-	ret = ocfs2_dlm_lock(osb->dlm, level, &lockres->l_lksb, lkm_flags,
+	ret = ocfs2_dlm_lock(osb->cconn, level, &lockres->l_lksb, lkm_flags,
 			     lockres->l_name, OCFS2_LOCK_ID_MAX_LEN - 1,
 			     lockres);
 	if (ret) {
@@ -2485,8 +2459,7 @@ static void ocfs2_dlm_shutdown_debug(struct ocfs2_super *osb)
 int ocfs2_dlm_init(struct ocfs2_super *osb)
 {
 	int status = 0;
-	u32 dlm_key;
-	struct dlm_ctxt *dlm = NULL;
+	struct ocfs2_cluster_connection *conn = NULL;
 
 	mlog_entry_void();
 
@@ -2508,26 +2481,21 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
 		goto bail;
 	}
 
-	/* used by the dlm code to make message headers unique, each
-	 * node in this domain must agree on this. */
-	dlm_key = crc32_le(0, osb->uuid_str, strlen(osb->uuid_str));
-
 	/* for now, uuid == domain */
-	dlm = dlm_register_domain(osb->uuid_str, dlm_key,
-				  &osb->osb_locking_proto);
-	if (IS_ERR(dlm)) {
-		status = PTR_ERR(dlm);
+	status = ocfs2_cluster_connect(osb->uuid_str,
+				       strlen(osb->uuid_str),
+				       ocfs2_do_node_down, osb,
+				       &conn);
+	if (status) {
 		mlog_errno(status);
 		goto bail;
 	}
 
-	dlm_register_eviction_cb(dlm, &osb->osb_eviction_cb);
-
 local:
 	ocfs2_super_lock_res_init(&osb->osb_super_lockres, osb);
 	ocfs2_rename_lock_res_init(&osb->osb_rename_lockres, osb);
 
-	osb->dlm = dlm;
+	osb->cconn = conn;
 
 	status = 0;
 bail:
@@ -2545,10 +2513,14 @@ void ocfs2_dlm_shutdown(struct ocfs2_super *osb)
 {
 	mlog_entry_void();
 
-	dlm_unregister_eviction_cb(&osb->osb_eviction_cb);
-
 	ocfs2_drop_osb_locks(osb);
 
+	/*
+	 * Now that we have dropped all locks and ocfs2_dismount_volume()
+	 * has disabled recovery, the DLM won't be talking to us.  It's
+	 * safe to tear things down before disconnecting the cluster.
+	 */
+
 	if (osb->dc_task) {
 		kthread_stop(osb->dc_task);
 		osb->dc_task = NULL;
@@ -2557,8 +2529,8 @@ void ocfs2_dlm_shutdown(struct ocfs2_super *osb)
 	ocfs2_lock_res_free(&osb->osb_super_lockres);
 	ocfs2_lock_res_free(&osb->osb_rename_lockres);
 
-	dlm_unregister_domain(osb->dlm);
-	osb->dlm = NULL;
+	ocfs2_cluster_disconnect(osb->cconn);
+	osb->cconn = NULL;
 
 	ocfs2_dlm_shutdown_debug(osb);
 
@@ -2689,7 +2661,7 @@ static int ocfs2_drop_lock(struct ocfs2_super *osb,
 
 	mlog(0, "lock %s\n", lockres->l_name);
 
-	ret = ocfs2_dlm_unlock(osb->dlm, &lockres->l_lksb, lkm_flags,
+	ret = ocfs2_dlm_unlock(osb->cconn, &lockres->l_lksb, lkm_flags,
 			       lockres);
 	if (ret) {
 		ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres);
@@ -2823,7 +2795,7 @@ static int ocfs2_downconvert_lock(struct ocfs2_super *osb,
 	if (lvb)
 		dlm_flags |= DLM_LKF_VALBLK;
 
-	ret = ocfs2_dlm_lock(osb->dlm,
+	ret = ocfs2_dlm_lock(osb->cconn,
 			     new_level,
 			     &lockres->l_lksb,
 			     dlm_flags,
@@ -2882,7 +2854,7 @@ static int ocfs2_cancel_convert(struct ocfs2_super *osb,
 	mlog_entry_void();
 	mlog(0, "lock %s\n", lockres->l_name);
 
-	ret = ocfs2_dlm_unlock(osb->dlm, &lockres->l_lksb,
+	ret = ocfs2_dlm_unlock(osb->cconn, &lockres->l_lksb,
 			       DLM_LKF_CANCEL, lockres);
 	if (ret) {
 		ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres);
@@ -3193,7 +3165,34 @@ static int ocfs2_dentry_convert_worker(struct ocfs2_lock_res *lockres,
 	return UNBLOCK_CONTINUE_POST;
 }
 
+/*
+ * This is the filesystem locking protocol.  It provides the lock handling
+ * hooks for the underlying DLM.  It has a maximum version number.
+ * The version number allows interoperability with systems running at
+ * the same major number and an equal or smaller minor number.
+ *
+ * Whenever the filesystem does new things with locks (adds or removes a
+ * lock, orders them differently, does different things underneath a lock),
+ * the version must be changed.  The protocol is negotiated when joining
+ * the dlm domain.  A node may join the domain if its major version is
+ * identical to all other nodes and its minor version is greater than
+ * or equal to all other nodes.  When its minor version is greater than
+ * the other nodes, it will run at the minor version specified by the
+ * other nodes.
+ *
+ * If a locking change is made that will not be compatible with older
+ * versions, the major number must be increased and the minor version set
+ * to zero.  If a change merely adds a behavior that can be disabled when
+ * speaking to older versions, the minor version must be increased.  If a
+ * change adds a fully backwards compatible change (eg, LVB changes that
+ * are just ignored by older versions), the version does not need to be
+ * updated.
+ */
 static struct ocfs2_locking_protocol lproto = {
+	.lp_max_version = {
+		.pv_major = OCFS2_LOCKING_PROTOCOL_MAJOR,
+		.pv_minor = OCFS2_LOCKING_PROTOCOL_MINOR,
+	},
 	.lp_lock_ast		= ocfs2_locking_ast,
 	.lp_blocking_ast	= ocfs2_blocking_ast,
 	.lp_unlock_ast		= ocfs2_unlock_ast,
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 3238043..2d0a8a0 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -117,5 +117,4 @@ void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug *dlm_debug);
 void dlmglue_init_stack(void);
 void dlmglue_exit_stack(void);
 
-extern const struct dlm_protocol_version ocfs2_locking_protocol;
 #endif	/* DLMGLUE_H */
diff --git a/fs/ocfs2/heartbeat.c b/fs/ocfs2/heartbeat.c
index 80de239..dcac1a4 100644
--- a/fs/ocfs2/heartbeat.c
+++ b/fs/ocfs2/heartbeat.c
@@ -30,8 +30,6 @@
 #include <linux/highmem.h>
 #include <linux/kmod.h>
 
-#include <dlm/dlmapi.h>
-
 #define MLOG_MASK_PREFIX ML_SUPER
 #include <cluster/masklog.h>
 
@@ -64,19 +62,20 @@ void ocfs2_init_node_maps(struct ocfs2_super *osb)
 	ocfs2_node_map_init(&osb->osb_recovering_orphan_dirs);
 }
 
-static void ocfs2_do_node_down(int node_num,
-			       struct ocfs2_super *osb)
+void ocfs2_do_node_down(int node_num, void *data)
 {
+	struct ocfs2_super *osb = data;
+
 	BUG_ON(osb->node_num == node_num);
 
 	mlog(0, "ocfs2: node down event for %d\n", node_num);
 
-	if (!osb->dlm) {
+	if (!osb->cconn) {
 		/*
-		 * No DLM means we're not even ready to participate yet.
-		 * We check the slots after the DLM comes up, so we will
-		 * notice the node death then.  We can safely ignore it
-		 * here.
+		 * No cluster connection means we're not even ready to
+		 * participate yet.  We check the slots after the cluster
+		 * comes up, so we will notice the node death then.  We
+		 * can safely ignore it here.
 		 */
 		return;
 	}
@@ -84,29 +83,6 @@ static void ocfs2_do_node_down(int node_num,
 	ocfs2_recovery_thread(osb, node_num);
 }
 
-/* Called from the dlm when it's about to evict a node. We may also
- * get a heartbeat callback later. */
-static void ocfs2_dlm_eviction_cb(int node_num,
-				  void *data)
-{
-	struct ocfs2_super *osb = (struct ocfs2_super *) data;
-	struct super_block *sb = osb->sb;
-
-	mlog(ML_NOTICE, "device (%u,%u): dlm has evicted node %d\n",
-	     MAJOR(sb->s_dev), MINOR(sb->s_dev), node_num);
-
-	ocfs2_do_node_down(node_num, osb);
-}
-
-void ocfs2_setup_hb_callbacks(struct ocfs2_super *osb)
-{
-	/* Not exactly a heartbeat callback, but leads to essentially
-	 * the same path so we set it up here. */
-	dlm_setup_eviction_cb(&osb->osb_eviction_cb,
-			      ocfs2_dlm_eviction_cb,
-			      osb);
-}
-
 void ocfs2_stop_heartbeat(struct ocfs2_super *osb)
 {
 	int ret;
diff --git a/fs/ocfs2/heartbeat.h b/fs/ocfs2/heartbeat.h
index 98d8ffc..38e2450 100644
--- a/fs/ocfs2/heartbeat.h
+++ b/fs/ocfs2/heartbeat.h
@@ -28,7 +28,7 @@
 
 void ocfs2_init_node_maps(struct ocfs2_super *osb);
 
-void ocfs2_setup_hb_callbacks(struct ocfs2_super *osb);
+void ocfs2_do_node_down(int node_num, void *data);
 void ocfs2_stop_heartbeat(struct ocfs2_super *osb);
 
 /* node map functions - used to keep track of mounted and in-recovery
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6d7c6d2..664e4fe 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -248,12 +248,10 @@ struct ocfs2_super
 	struct ocfs2_alloc_stats alloc_stats;
 	char dev_str[20];		/* "major,minor" of the device */
 
-	struct dlm_ctxt *dlm;
+	struct ocfs2_cluster_connection *cconn;
 	struct ocfs2_lock_res osb_super_lockres;
 	struct ocfs2_lock_res osb_rename_lockres;
-	struct dlm_eviction_cb osb_eviction_cb;
 	struct ocfs2_dlm_debug *osb_dlm_debug;
-	struct dlm_protocol_version osb_locking_proto;
 
 	struct dentry *osb_debug_root;
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index eb88854..f6f309a 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -18,11 +18,21 @@
  * General Public License for more details.
  */
 
+#include <linux/slab.h>
+#include <linux/crc32.h>
+
+/* Needed for AOP_TRUNCATED_PAGE in mlog_errno() */
+#include <linux/fs.h>
+
 #include "cluster/masklog.h"
 #include "stackglue.h"
 
 static struct ocfs2_locking_protocol *lproto;
 
+struct o2dlm_private {
+	struct dlm_eviction_cb op_eviction_cb;
+};
+
 /* These should be identical */
 #if (DLM_LOCK_IV != LKM_IVMODE)
 # error Lock modes do not match
@@ -197,7 +207,7 @@ static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
 	lproto->lp_unlock_ast(astarg, error);
 }
 
-int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
+int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   int mode,
 		   union ocfs2_dlm_lksb *lksb,
 		   u32 flags,
@@ -212,15 +222,15 @@ int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
 
 	BUG_ON(lproto == NULL);
 
-	status = dlmlock(dlm, o2dlm_mode, &lksb->lksb_o2dlm, o2dlm_flags,
-			 name, namelen,
+	status = dlmlock(conn->cc_lockspace, o2dlm_mode, &lksb->lksb_o2dlm,
+			 o2dlm_flags, name, namelen,
 			 o2dlm_lock_ast_wrapper, astarg,
 			 o2dlm_blocking_ast_wrapper);
 	ret = dlm_status_to_errno(status);
 	return ret;
 }
 
-int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
+int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
 		     union ocfs2_dlm_lksb *lksb,
 		     u32 flags,
 		     void *astarg)
@@ -231,8 +241,8 @@ int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
 
 	BUG_ON(lproto == NULL);
 
-	status = dlmunlock(dlm, &lksb->lksb_o2dlm, o2dlm_flags,
-			   o2dlm_unlock_ast_wrapper, astarg);
+	status = dlmunlock(conn->cc_lockspace, &lksb->lksb_o2dlm,
+			   o2dlm_flags, o2dlm_unlock_ast_wrapper, astarg);
 	ret = dlm_status_to_errno(status);
 	return ret;
 }
@@ -252,6 +262,115 @@ void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb)
 	return (void *)(lksb->lksb_o2dlm.lvb);
 }
 
+/*
+ * Called from the dlm when it's about to evict a node. This is how the
+ * classic stack signals node death.
+ */
+static void o2dlm_eviction_cb(int node_num, void *data)
+{
+	struct ocfs2_cluster_connection *conn = data;
+
+	mlog(ML_NOTICE, "o2dlm has evicted node %d from group %.*s\n",
+	     node_num, conn->cc_namelen, conn->cc_name);
+
+	conn->cc_recovery_handler(node_num, conn->cc_recovery_data);
+}
+
+int ocfs2_cluster_connect(const char *group,
+			  int grouplen,
+			  void (*recovery_handler)(int node_num,
+						   void *recovery_data),
+			  void *recovery_data,
+			  struct ocfs2_cluster_connection **conn)
+{
+	int rc = 0;
+	struct ocfs2_cluster_connection *new_conn;
+	u32 dlm_key;
+	struct dlm_ctxt *dlm;
+	struct o2dlm_private *priv;
+	struct dlm_protocol_version dlm_version;
+
+	BUG_ON(group == NULL);
+	BUG_ON(conn == NULL);
+	BUG_ON(recovery_handler == NULL);
+
+	if (grouplen > GROUP_NAME_MAX) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	new_conn = kzalloc(sizeof(struct ocfs2_cluster_connection),
+			   GFP_KERNEL);
+	if (!new_conn) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	memcpy(new_conn->cc_name, group, grouplen);
+	new_conn->cc_namelen = grouplen;
+	new_conn->cc_recovery_handler = recovery_handler;
+	new_conn->cc_recovery_data = recovery_data;
+
+	/* Start the new connection at our maximum compatibility level */
+	new_conn->cc_version = lproto->lp_max_version;
+
+	priv = kzalloc(sizeof(struct o2dlm_private), GFP_KERNEL);
+	if (!priv) {
+		rc = -ENOMEM;
+		goto out_free;
+	}
+
+	/* This just fills the structure in.  It is safe to use new_conn. */
+	dlm_setup_eviction_cb(&priv->op_eviction_cb, o2dlm_eviction_cb,
+			      new_conn);
+
+	new_conn->cc_private = priv;
+
+	/* used by the dlm code to make message headers unique, each
+	 * node in this domain must agree on this. */
+	dlm_key = crc32_le(0, group, grouplen);
+	dlm_version.pv_major = new_conn->cc_version.pv_major;
+	dlm_version.pv_minor = new_conn->cc_version.pv_minor;
+
+	dlm = dlm_register_domain(group, dlm_key, &dlm_version);
+	if (IS_ERR(dlm)) {
+		rc = PTR_ERR(dlm);
+		mlog_errno(rc);
+		goto out_free;
+	}
+
+	new_conn->cc_version.pv_major = dlm_version.pv_major;
+	new_conn->cc_version.pv_minor = dlm_version.pv_minor;
+	new_conn->cc_lockspace = dlm;
+
+	dlm_register_eviction_cb(dlm, &priv->op_eviction_cb);
+
+	*conn = new_conn;
+
+out_free:
+	if (rc) {
+		kfree(new_conn->cc_private);
+		kfree(new_conn);
+	}
+
+out:
+	return rc;
+}
+
+int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
+{
+	struct dlm_ctxt *dlm = conn->cc_lockspace;
+	struct o2dlm_private *priv = conn->cc_private;
+
+	dlm_unregister_eviction_cb(&priv->op_eviction_cb);
+	dlm_unregister_domain(dlm);
+
+	kfree(priv);
+	kfree(conn);
+
+	return 0;
+}
+
 void o2cb_get_stack(struct ocfs2_locking_protocol *proto)
 {
 	BUG_ON(proto == NULL);
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 3c91e24..3900b5c 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -32,9 +32,22 @@
  */
 #define DLM_LKF_LOCAL		0x00100000
 
+/*
+ * This shadows DLM_LOCKSPACE_LEN in fs/dlm/dlm_internal.h.  That probably
+ * wants to be in a public header.
+ */
+#define GROUP_NAME_MAX		64
+
+
 #include "dlm/dlmapi.h"
 
+struct ocfs2_protocol_version {
+	u8 pv_major;
+	u8 pv_minor;
+};
+
 struct ocfs2_locking_protocol {
+	struct ocfs2_protocol_version lp_max_version;
 	void (*lp_lock_ast)(void *astarg);
 	void (*lp_blocking_ast)(void *astarg, int level);
 	void (*lp_unlock_ast)(void *astarg, int error);
@@ -44,14 +57,32 @@ union ocfs2_dlm_lksb {
 	struct dlm_lockstatus lksb_o2dlm;
 };
 
-int ocfs2_dlm_lock(struct dlm_ctxt *dlm,
+struct ocfs2_cluster_connection {
+	char cc_name[GROUP_NAME_MAX];
+	int cc_namelen;
+	struct ocfs2_protocol_version cc_version;
+	void (*cc_recovery_handler)(int node_num, void *recovery_data);
+	void *cc_recovery_data;
+	void *cc_lockspace;
+	void *cc_private;
+};
+
+int ocfs2_cluster_connect(const char *group,
+			  int grouplen,
+			  void (*recovery_handler)(int node_num,
+						   void *recovery_data),
+			  void *recovery_data,
+			  struct ocfs2_cluster_connection **conn);
+int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn);
+
+int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   int mode,
 		   union ocfs2_dlm_lksb *lksb,
 		   u32 flags,
 		   void *name,
 		   unsigned int namelen,
 		   void *astarg);
-int ocfs2_dlm_unlock(struct dlm_ctxt *dlm,
+int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
 		     union ocfs2_dlm_lksb *lksb,
 		     u32 flags,
 		     void *astarg);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index c867546..0ee4975 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1251,9 +1251,9 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 
 	ocfs2_sync_blockdev(sb);
 
-	/* No dlm means we've failed during mount, so skip all the
-	 * steps which depended on that to complete. */
-	if (osb->dlm) {
+	/* No cluster connection means we've failed during mount, so skip
+	 * all the steps which depended on that to complete. */
+	if (osb->cconn) {
 		tmp = ocfs2_super_lock(osb, 1);
 		if (tmp < 0) {
 			mlog_errno(tmp);
@@ -1264,12 +1264,12 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 	if (osb->slot_num != OCFS2_INVALID_SLOT)
 		ocfs2_put_slot(osb);
 
-	if (osb->dlm)
+	if (osb->cconn)
 		ocfs2_super_unlock(osb, 1);
 
 	ocfs2_release_system_inodes(osb);
 
-	if (osb->dlm)
+	if (osb->cconn)
 		ocfs2_dlm_shutdown(osb);
 
 	debugfs_remove(osb->osb_debug_root);
@@ -1341,7 +1341,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	sb->s_fs_info = osb;
 	sb->s_op = &ocfs2_sops;
 	sb->s_export_op = &ocfs2_export_ops;
-	osb->osb_locking_proto = ocfs2_locking_protocol;
 	sb->s_time_gran = 1;
 	sb->s_flags |= MS_NOATIME;
 	/* this is needed to support O_LARGEFILE */
@@ -1391,8 +1390,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	osb->local_alloc_state = OCFS2_LA_UNUSED;
 	osb->local_alloc_bh = NULL;
 
-	ocfs2_setup_hb_callbacks(osb);
-
 	init_waitqueue_head(&osb->osb_mount_event);
 
 	osb->vol_label = kmalloc(OCFS2_MAX_VOL_LABEL_LEN, GFP_KERNEL);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 13/62] ocfs2: Abstract out node number queries.
  2008-04-02 20:14                       ` [Ocfs2-devel] [PATCH 12/62] ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API Mark Fasheh
@ 2008-04-02 20:14                         ` Mark Fasheh
  2008-04-02 20:14                           ` [Ocfs2-devel] [PATCH 14/62] ocfs2: Move o2hb functionality into the stack glue Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

ocfs2 asks the cluster stack for the local node's node number for two
reasons; to fill the slot map and to print it. While the slot map isn't
necessary for userspace cluster stacks, the printing is very nice for
debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
this value. It is anticipated that the slot map will not be used under a
userspace cluster stack, so validity checks of the node num only need to
exist in the slot map code. Otherwise, it just gets used and printed as an
opaque value.

[ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
  truly opaque. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/ocfs2.h     |    2 +-
 fs/ocfs2/slot_map.c  |    2 --
 fs/ocfs2/stackglue.c |   17 +++++++++++++++++
 fs/ocfs2/stackglue.h |    1 +
 fs/ocfs2/super.c     |   22 +++++++++++-----------
 5 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 664e4fe..7006aba 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -218,7 +218,7 @@ struct ocfs2_super
 	unsigned int s_atime_quantum;
 
 	unsigned int max_slots;
-	s16 node_num;
+	unsigned int node_num;
 	int slot_num;
 	int preferred_slot;
 	int s_sectsize_bits;
diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c
index 63fb1b2..bb5ff89 100644
--- a/fs/ocfs2/slot_map.c
+++ b/fs/ocfs2/slot_map.c
@@ -73,8 +73,6 @@ static void ocfs2_set_slot(struct ocfs2_slot_info *si,
 			   int slot_num, unsigned int node_num)
 {
 	BUG_ON((slot_num < 0) || (slot_num >= si->si_num_slots));
-	BUG_ON((node_num == O2NM_INVALID_NODE_NUM) ||
-	       (node_num >= O2NM_MAX_NODES));
 
 	si->si_slots[slot_num].sl_valid = 1;
 	si->si_slots[slot_num].sl_node_num = node_num;
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index f6f309a..8146863 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -25,6 +25,8 @@
 #include <linux/fs.h>
 
 #include "cluster/masklog.h"
+#include "cluster/nodemanager.h"
+
 #include "stackglue.h"
 
 static struct ocfs2_locking_protocol *lproto;
@@ -371,6 +373,21 @@ int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
 	return 0;
 }
 
+int ocfs2_cluster_this_node(unsigned int *node)
+{
+	int node_num;
+
+	node_num = o2nm_this_node();
+	if (node_num == O2NM_INVALID_NODE_NUM)
+		return -ENOENT;
+
+	if (node_num >= O2NM_MAX_NODES)
+		return -EOVERFLOW;
+
+	*node = node_num;
+	return 0;
+}
+
 void o2cb_get_stack(struct ocfs2_locking_protocol *proto)
 {
 	BUG_ON(proto == NULL);
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 3900b5c..ccb0399 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -74,6 +74,7 @@ int ocfs2_cluster_connect(const char *group,
 			  void *recovery_data,
 			  struct ocfs2_cluster_connection **conn);
 int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn);
+int ocfs2_cluster_this_node(unsigned int *node);
 
 int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   int mode,
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 0ee4975..d3c4d32 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -694,7 +694,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
 	if (ocfs2_mount_local(osb))
 		snprintf(nodestr, sizeof(nodestr), "local");
 	else
-		snprintf(nodestr, sizeof(nodestr), "%d", osb->node_num);
+		snprintf(nodestr, sizeof(nodestr), "%u", osb->node_num);
 
 	printk(KERN_INFO "ocfs2: Mounting device (%s) on (node %s, slot %d) "
 	       "with %s data mode.\n",
@@ -1145,16 +1145,17 @@ static int ocfs2_fill_local_node_info(struct ocfs2_super *osb)
 	 * desirable. */
 	if (ocfs2_mount_local(osb))
 		osb->node_num = 0;
-	else
-		osb->node_num = o2nm_this_node();
-
-	if (osb->node_num == O2NM_MAX_NODES) {
-		mlog(ML_ERROR, "could not find this host's node number\n");
-		status = -ENOENT;
-		goto bail;
+	else {
+		status = ocfs2_cluster_this_node(&osb->node_num);
+		if (status < 0) {
+			mlog_errno(status);
+			mlog(ML_ERROR,
+			     "could not find this host's node number\n");
+			goto bail;
+		}
 	}
 
-	mlog(0, "I am node %d\n", osb->node_num);
+	mlog(0, "I am node %u\n", osb->node_num);
 
 	status = 0;
 bail:
@@ -1282,7 +1283,7 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 	if (ocfs2_mount_local(osb))
 		snprintf(nodestr, sizeof(nodestr), "local");
 	else
-		snprintf(nodestr, sizeof(nodestr), "%d", osb->node_num);
+		snprintf(nodestr, sizeof(nodestr), "%u", osb->node_num);
 
 	printk(KERN_INFO "ocfs2: Unmounting device (%s) on (node %s)\n",
 	       osb->dev_str, nodestr);
@@ -1384,7 +1385,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
 
 	osb->s_atime_quantum = OCFS2_DEFAULT_ATIME_QUANTUM;
 
-	osb->node_num = O2NM_INVALID_NODE_NUM;
 	osb->slot_num = OCFS2_INVALID_SLOT;
 
 	osb->local_alloc_state = OCFS2_LA_UNUSED;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 14/62] ocfs2: Move o2hb functionality into the stack glue.
  2008-04-02 20:14                         ` [Ocfs2-devel] [PATCH 13/62] ocfs2: Abstract out node number queries Mark Fasheh
@ 2008-04-02 20:14                           ` Mark Fasheh
  2008-04-02 20:14                             ` [Ocfs2-devel] [PATCH 15/62] ocfs2: Fill node number during cluster stack init Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The last bit of classic stack used directly in ocfs2 code is o2hb.
Specifically, the check for heartbeat during mount and the call to
ocfs2_hb_ctl during unmount.

We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
to ocfs2_hb_ctl.  Other stacks will just leave hangup() empty.

The check for heartbeat is moved into ocfs2_cluster_connect().  It will
be matched by a similar check for other stacks.

With this change, only stackglue.c includes cluster/ headers.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |    4 ----
 fs/ocfs2/heartbeat.c |   33 ---------------------------------
 fs/ocfs2/heartbeat.h |    1 -
 fs/ocfs2/ioctl.c     |    1 +
 fs/ocfs2/ocfs2.h     |    4 ----
 fs/ocfs2/stackglue.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/stackglue.h |    1 +
 fs/ocfs2/super.c     |   23 ++++++++++-------------
 8 files changed, 62 insertions(+), 55 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 6652a48..05fd016 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -32,10 +32,6 @@
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 
-#include <cluster/heartbeat.h>
-#include <cluster/nodemanager.h>
-#include <cluster/tcp.h>
-
 #define MLOG_MASK_PREFIX ML_DLM_GLUE
 #include <cluster/masklog.h>
 
diff --git a/fs/ocfs2/heartbeat.c b/fs/ocfs2/heartbeat.c
index dcac1a4..c6e7213 100644
--- a/fs/ocfs2/heartbeat.c
+++ b/fs/ocfs2/heartbeat.c
@@ -28,7 +28,6 @@
 #include <linux/types.h>
 #include <linux/slab.h>
 #include <linux/highmem.h>
-#include <linux/kmod.h>
 
 #define MLOG_MASK_PREFIX ML_SUPER
 #include <cluster/masklog.h>
@@ -83,38 +82,6 @@ void ocfs2_do_node_down(int node_num, void *data)
 	ocfs2_recovery_thread(osb, node_num);
 }
 
-void ocfs2_stop_heartbeat(struct ocfs2_super *osb)
-{
-	int ret;
-	char *argv[5], *envp[3];
-
-	if (ocfs2_mount_local(osb))
-		return;
-
-	if (!osb->uuid_str) {
-		/* This can happen if we don't get far enough in mount... */
-		mlog(0, "No UUID with which to stop heartbeat!\n\n");
-		return;
-	}
-
-	argv[0] = (char *)o2nm_get_hb_ctl_path();
-	argv[1] = "-K";
-	argv[2] = "-u";
-	argv[3] = osb->uuid_str;
-	argv[4] = NULL;
-
-	mlog(0, "Run: %s %s %s %s\n", argv[0], argv[1], argv[2], argv[3]);
-
-	/* minimal command environment taken from cpu_run_sbin_hotplug */
-	envp[0] = "HOME=/";
-	envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
-	envp[2] = NULL;
-
-	ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
-	if (ret < 0)
-		mlog_errno(ret);
-}
-
 static inline void __ocfs2_node_map_set_bit(struct ocfs2_node_map *map,
 					    int bit)
 {
diff --git a/fs/ocfs2/heartbeat.h b/fs/ocfs2/heartbeat.h
index 38e2450..74b9c5d 100644
--- a/fs/ocfs2/heartbeat.h
+++ b/fs/ocfs2/heartbeat.h
@@ -29,7 +29,6 @@
 void ocfs2_init_node_maps(struct ocfs2_super *osb);
 
 void ocfs2_do_node_down(int node_num, void *data);
-void ocfs2_stop_heartbeat(struct ocfs2_super *osb);
 
 /* node map functions - used to keep track of mounted and in-recovery
  * nodes. */
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index 5177fba..ab1c216 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -7,6 +7,7 @@
 
 #include <linux/fs.h>
 #include <linux/mount.h>
+#include <linux/smp_lock.h>
 
 #define MLOG_MASK_PREFIX ML_INODE
 #include <cluster/masklog.h>
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 7006aba..31dc28b 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -36,10 +36,6 @@
 #include <linux/mutex.h>
 #include <linux/jbd.h>
 
-#include "cluster/nodemanager.h"
-#include "cluster/heartbeat.h"
-#include "cluster/tcp.h"
-
 /* For union ocfs2_dlm_lksb */
 #include "stackglue.h"
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 8146863..670fa94 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -20,12 +20,14 @@
 
 #include <linux/slab.h>
 #include <linux/crc32.h>
+#include <linux/kmod.h>
 
 /* Needed for AOP_TRUNCATED_PAGE in mlog_errno() */
 #include <linux/fs.h>
 
 #include "cluster/masklog.h"
 #include "cluster/nodemanager.h"
+#include "cluster/heartbeat.h"
 
 #include "stackglue.h"
 
@@ -301,6 +303,13 @@ int ocfs2_cluster_connect(const char *group,
 		goto out;
 	}
 
+	/* for now we only have one cluster/node, make sure we see it
+	 * in the heartbeat universe */
+	if (!o2hb_check_local_node_heartbeating()) {
+		rc = -EINVAL;
+		goto out;
+	}
+
 	new_conn = kzalloc(sizeof(struct ocfs2_cluster_connection),
 			   GFP_KERNEL);
 	if (!new_conn) {
@@ -359,6 +368,7 @@ out:
 	return rc;
 }
 
+
 int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
 {
 	struct dlm_ctxt *dlm = conn->cc_lockspace;
@@ -373,6 +383,46 @@ int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
 	return 0;
 }
 
+static void o2hb_stop(const char *group)
+{
+	int ret;
+	char *argv[5], *envp[3];
+
+	argv[0] = (char *)o2nm_get_hb_ctl_path();
+	argv[1] = "-K";
+	argv[2] = "-u";
+	argv[3] = (char *)group;
+	argv[4] = NULL;
+
+	mlog(0, "Run: %s %s %s %s\n", argv[0], argv[1], argv[2], argv[3]);
+
+	/* minimal command environment taken from cpu_run_sbin_hotplug */
+	envp[0] = "HOME=/";
+	envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
+	envp[2] = NULL;
+
+	ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
+	if (ret < 0)
+		mlog_errno(ret);
+}
+
+/*
+ * Hangup is a hack for tools compatibility.  Older ocfs2-tools software
+ * expects the filesystem to call "ocfs2_hb_ctl" during unmount.  This
+ * happens regardless of whether the DLM got started, so we can't do it
+ * in ocfs2_cluster_disconnect().  We bring the o2hb_stop() function into
+ * the glue and provide a "hangup" API for super.c to call.
+ *
+ * Other stacks will eventually provide a NULL ->hangup() pointer.
+ */
+void ocfs2_cluster_hangup(const char *group, int grouplen)
+{
+	BUG_ON(group == NULL);
+	BUG_ON(group[grouplen] != '\0');
+
+	o2hb_stop(group);
+}
+
 int ocfs2_cluster_this_node(unsigned int *node)
 {
 	int node_num;
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index ccb0399..22af77b 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -74,6 +74,7 @@ int ocfs2_cluster_connect(const char *group,
 			  void *recovery_data,
 			  struct ocfs2_cluster_connection **conn);
 int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn);
+void ocfs2_cluster_hangup(const char *group, int grouplen);
 int ocfs2_cluster_this_node(unsigned int *node);
 
 int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index d3c4d32..8f536b3 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -40,8 +40,7 @@
 #include <linux/crc32.h>
 #include <linux/debugfs.h>
 #include <linux/mount.h>
-
-#include <cluster/nodemanager.h>
+#include <linux/seq_file.h>
 
 #define MLOG_MASK_PREFIX ML_SUPER
 #include <cluster/masklog.h>
@@ -579,15 +578,6 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
 		goto read_super_error;
 	}
 
-	/* for now we only have one cluster/node, make sure we see it
-	 * in the heartbeat universe */
-	if (parsed_options.mount_opt & OCFS2_MOUNT_HB_LOCAL) {
-		if (!o2hb_check_local_node_heartbeating()) {
-			status = -EINVAL;
-			goto read_super_error;
-		}
-	}
-
 	/* probe for superblock */
 	status = ocfs2_sb_probe(sb, &bh, &sector_size);
 	if (status < 0) {
@@ -1275,8 +1265,15 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 
 	debugfs_remove(osb->osb_debug_root);
 
-	if (!mnt_err)
-		ocfs2_stop_heartbeat(osb);
+	/*
+	 * This is a small hack to move ocfs2_hb_ctl into stackglue.
+	 * If we're dismounting due to mount error, mount.ocfs2 will clean
+	 * up heartbeat.  If we're a local mount, there is no heartbeat.
+	 * If we failed before we got a uuid_str yet, we can't stop
+	 * heartbeat.  Otherwise, do it.
+	 */
+	if (!mnt_err && !ocfs2_mount_local(osb) && osb->uuid_str)
+		ocfs2_cluster_hangup(osb->uuid_str, strlen(osb->uuid_str));
 
 	atomic_set(&osb->vol_state, VOLUME_DISMOUNTED);
 
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 15/62] ocfs2: Fill node number during cluster stack init
  2008-04-02 20:14                           ` [Ocfs2-devel] [PATCH 14/62] ocfs2: Move o2hb functionality into the stack glue Mark Fasheh
@ 2008-04-02 20:14                             ` Mark Fasheh
  2008-04-02 20:14                               ` [Ocfs2-devel] [PATCH 16/62] ocfs2: Remove CANCELGRANT from the view of dlmglue Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

It doesn't make sense to query for a node number before connecting to the
cluster stack. This should be safe to do because node_num is only just
printed,
and we're actually only moving the setting of node num a small amount
further in the mount process.

[ Disconnect when node query fails -- Joel ]

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c |   13 ++++++++++++-
 fs/ocfs2/super.c   |   33 ---------------------------------
 2 files changed, 12 insertions(+), 34 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 05fd016..c7653bb 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2459,8 +2459,10 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
 
 	mlog_entry_void();
 
-	if (ocfs2_mount_local(osb))
+	if (ocfs2_mount_local(osb)) {
+		osb->node_num = 0;
 		goto local;
+	}
 
 	status = ocfs2_dlm_init_debug(osb);
 	if (status < 0) {
@@ -2487,6 +2489,15 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
 		goto bail;
 	}
 
+	status = ocfs2_cluster_this_node(&osb->node_num);
+	if (status < 0) {
+		mlog_errno(status);
+		mlog(ML_ERROR,
+		     "could not find this host's node number\n");
+		ocfs2_cluster_disconnect(conn);
+		goto bail;
+	}
+
 local:
 	ocfs2_super_lock_res_init(&osb->osb_super_lockres, osb);
 	ocfs2_rename_lock_res_init(&osb->osb_rename_lockres, osb);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 8f536b3..fa9c46e 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -108,7 +108,6 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait);
 static int ocfs2_init_global_system_inodes(struct ocfs2_super *osb);
 static int ocfs2_init_local_system_inodes(struct ocfs2_super *osb);
 static void ocfs2_release_system_inodes(struct ocfs2_super *osb);
-static int ocfs2_fill_local_node_info(struct ocfs2_super *osb);
 static int ocfs2_check_volume(struct ocfs2_super *osb);
 static int ocfs2_verify_volume(struct ocfs2_dinode *di,
 			       struct buffer_head *bh,
@@ -1126,32 +1125,6 @@ static int ocfs2_get_sector(struct super_block *sb,
 	return 0;
 }
 
-/* ocfs2 1.0 only allows one cluster and node identity per kernel image. */
-static int ocfs2_fill_local_node_info(struct ocfs2_super *osb)
-{
-	int status;
-
-	/* XXX hold a ref on the node while mounte?  easy enough, if
-	 * desirable. */
-	if (ocfs2_mount_local(osb))
-		osb->node_num = 0;
-	else {
-		status = ocfs2_cluster_this_node(&osb->node_num);
-		if (status < 0) {
-			mlog_errno(status);
-			mlog(ML_ERROR,
-			     "could not find this host's node number\n");
-			goto bail;
-		}
-	}
-
-	mlog(0, "I am node %u\n", osb->node_num);
-
-	status = 0;
-bail:
-	return status;
-}
-
 static int ocfs2_mount_volume(struct super_block *sb)
 {
 	int status = 0;
@@ -1163,12 +1136,6 @@ static int ocfs2_mount_volume(struct super_block *sb)
 	if (ocfs2_is_hard_readonly(osb))
 		goto leave;
 
-	status = ocfs2_fill_local_node_info(osb);
-	if (status < 0) {
-		mlog_errno(status);
-		goto leave;
-	}
-
 	status = ocfs2_dlm_init(osb);
 	if (status < 0) {
 		mlog_errno(status);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 16/62] ocfs2: Remove CANCELGRANT from the view of dlmglue.
  2008-04-02 20:14                             ` [Ocfs2-devel] [PATCH 15/62] ocfs2: Fill node number during cluster stack init Mark Fasheh
@ 2008-04-02 20:14                               ` Mark Fasheh
  2008-04-02 20:14                                 ` [Ocfs2-devel] [PATCH 17/62] ocfs2: handle async EAGAIN from NOQUEUE request Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

o2dlm has the non-standard behavior of providing a cancel callback
(unlock_ast) even when the cancel has failed (the locking operation
succeeded without canceling).  This is called CANCELGRANT after the
status code sent to the callback.  fs/dlm does not provide this
callback, so dlmglue must be changed to live without it.
o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.

Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
needs to check for it.  ocfs2_locking_ast() must catch that a cancel was
tried and clear the cancel state.

Making these changes opens up a locking race.  dlmglue uses the the
OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
one time.  But dlmglue must unlock the lockres before calling into the
dlm.  In the small window of time between unlocking the lockres and
calling the dlm, the downconvert thread can try to cancel the lock.  The
downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
know that ocfs2_dlm_lock() has not yet been called.

Because ocfs2_dlm_lock() has not yet been called, the cancel operation
will just be a no-op.  There's nothing to cancel.  With CANCELGRANT,
dlmglue uses the CANCELGRANT callback to clear up the cancel state.
When it comes around again, it will retry the cancel.  Eventually, the
first thread will have called into ocfs2_dlm_lock(), and either the
lock or the cancel will succeed.  The downconvert thread can then do its
downconvert.

Without CANCELGRANT, there is nothing to clean up the cancellation
state.  The downconvert thread does not know to retry its operations.
More importantly, the original lock may be blocking on the other node
that is trying to cancel us.  With neither able to make progress, the
ast is never called and the cancellation state is never cleaned up that
way.  dlmglue is deadlocked.

The OCFS2_LOCK_PENDING flag is introduced to remedy this window.  It is
set at the same time OCFS2_LOCK_BUSY is.  Thus, the downconvert thread
can check whether the lock is cancelable.  If not, it just loops around
to try again.  Once ocfs2_dlm_lock() is called, the thread then clears
OCFS2_LOCK_PENDING and wakes the downconvert thread.  Now, if the
downconvert thread finds the lock BUSY, it can safely try to cancel it.
Whether the cancel works or not, the state will be properly set and the
lock processing can continue.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |  199 +++++++++++++++++++++++++++++++++++++++++++-------
 fs/ocfs2/ocfs2.h     |    4 +
 fs/ocfs2/stackglue.c |   40 +++-------
 3 files changed, 188 insertions(+), 55 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index c7653bb..295c47f 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -311,12 +311,13 @@ static int ocfs2_inode_lock_update(struct inode *inode,
 				  struct buffer_head **bh);
 static void ocfs2_drop_osb_locks(struct ocfs2_super *osb);
 static inline int ocfs2_highest_compat_lock_level(int level);
-static void ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
-				      int new_level);
+static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
+					      int new_level);
 static int ocfs2_downconvert_lock(struct ocfs2_super *osb,
 				  struct ocfs2_lock_res *lockres,
 				  int new_level,
-				  int lvb);
+				  int lvb,
+				  unsigned int generation);
 static int ocfs2_prepare_cancel_convert(struct ocfs2_super *osb,
 				        struct ocfs2_lock_res *lockres);
 static int ocfs2_cancel_convert(struct ocfs2_super *osb,
@@ -736,6 +737,113 @@ static int ocfs2_generic_handle_bast(struct ocfs2_lock_res *lockres,
 	return needs_downconvert;
 }
 
+/*
+ * OCFS2_LOCK_PENDING and l_pending_gen.
+ *
+ * Why does OCFS2_LOCK_PENDING exist?  To close a race between setting
+ * OCFS2_LOCK_BUSY and calling ocfs2_dlm_lock().  See ocfs2_unblock_lock()
+ * for more details on the race.
+ *
+ * OCFS2_LOCK_PENDING closes the race quite nicely.  However, it introduces
+ * a race on itself.  In o2dlm, we can get the ast before ocfs2_dlm_lock()
+ * returns.  The ast clears OCFS2_LOCK_BUSY, and must therefore clear
+ * OCFS2_LOCK_PENDING at the same time.  When ocfs2_dlm_lock() returns,
+ * the caller is going to try to clear PENDING again.  If nothing else is
+ * happening, __lockres_clear_pending() sees PENDING is unset and does
+ * nothing.
+ *
+ * But what if another path (eg downconvert thread) has just started a
+ * new locking action?  The other path has re-set PENDING.  Our path
+ * cannot clear PENDING, because that will re-open the original race
+ * window.
+ *
+ * [Example]
+ *
+ * ocfs2_meta_lock()
+ *  ocfs2_cluster_lock()
+ *   set BUSY
+ *   set PENDING
+ *   drop l_lock
+ *   ocfs2_dlm_lock()
+ *    ocfs2_locking_ast()		ocfs2_downconvert_thread()
+ *     clear PENDING			 ocfs2_unblock_lock()
+ *					  take_l_lock
+ *					  !BUSY
+ *					  ocfs2_prepare_downconvert()
+ *					   set BUSY
+ *					   set PENDING
+ *					  drop l_lock
+ *   take l_lock
+ *   clear PENDING
+ *   drop l_lock
+ *			<window>
+ *					  ocfs2_dlm_lock()
+ *
+ * So as you can see, we now have a window where l_lock is not held,
+ * PENDING is not set, and ocfs2_dlm_lock() has not been called.
+ *
+ * The core problem is that ocfs2_cluster_lock() has cleared the PENDING
+ * set by ocfs2_prepare_downconvert().  That wasn't nice.
+ *
+ * To solve this we introduce l_pending_gen.  A call to
+ * lockres_clear_pending() will only do so when it is passed a generation
+ * number that matches the lockres.  lockres_set_pending() will return the
+ * current generation number.  When ocfs2_cluster_lock() goes to clear
+ * PENDING, it passes the generation it got from set_pending().  In our
+ * example above, the generation numbers will *not* match.  Thus,
+ * ocfs2_cluster_lock() will not clear the PENDING set by
+ * ocfs2_prepare_downconvert().
+ */
+
+/* Unlocked version for ocfs2_locking_ast() */
+static void __lockres_clear_pending(struct ocfs2_lock_res *lockres,
+				    unsigned int generation,
+				    struct ocfs2_super *osb)
+{
+	assert_spin_locked(&lockres->l_lock);
+
+	/*
+	 * The ast and locking functions can race us here.  The winner
+	 * will clear pending, the loser will not.
+	 */
+	if (!(lockres->l_flags & OCFS2_LOCK_PENDING) ||
+	    (lockres->l_pending_gen != generation))
+		return;
+
+	lockres_clear_flags(lockres, OCFS2_LOCK_PENDING);
+	lockres->l_pending_gen++;
+
+	/*
+	 * The downconvert thread may have skipped us because we
+	 * were PENDING.  Wake it up.
+	 */
+	if (lockres->l_flags & OCFS2_LOCK_BLOCKED)
+		ocfs2_wake_downconvert_thread(osb);
+}
+
+/* Locked version for callers of ocfs2_dlm_lock() */
+static void lockres_clear_pending(struct ocfs2_lock_res *lockres,
+				  unsigned int generation,
+				  struct ocfs2_super *osb)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&lockres->l_lock, flags);
+	__lockres_clear_pending(lockres, generation, osb);
+	spin_unlock_irqrestore(&lockres->l_lock, flags);
+}
+
+static unsigned int lockres_set_pending(struct ocfs2_lock_res *lockres)
+{
+	assert_spin_locked(&lockres->l_lock);
+	BUG_ON(!(lockres->l_flags & OCFS2_LOCK_BUSY));
+
+	lockres_or_flags(lockres, OCFS2_LOCK_PENDING);
+
+	return lockres->l_pending_gen;
+}
+
+
 static void ocfs2_blocking_ast(void *opaque, int level)
 {
 	struct ocfs2_lock_res *lockres = opaque;
@@ -770,6 +878,7 @@ static void ocfs2_blocking_ast(void *opaque, int level)
 static void ocfs2_locking_ast(void *opaque)
 {
 	struct ocfs2_lock_res *lockres = opaque;
+	struct ocfs2_super *osb = ocfs2_get_lockres_osb(lockres);
 	unsigned long flags;
 
 	spin_lock_irqsave(&lockres->l_lock, flags);
@@ -805,6 +914,18 @@ static void ocfs2_locking_ast(void *opaque)
 	 * can catch it. */
 	lockres->l_action = OCFS2_AST_INVALID;
 
+	/* Did we try to cancel this lock?  Clear that state */
+	if (lockres->l_unlock_action == OCFS2_UNLOCK_CANCEL_CONVERT)
+		lockres->l_unlock_action = OCFS2_UNLOCK_INVALID;
+
+	/*
+	 * We may have beaten the locking functions here.  We certainly
+	 * know that dlm_lock() has been called :-)
+	 * Because we can't have two lock calls in flight at once, we
+	 * can use lockres->l_pending_gen.
+	 */
+	__lockres_clear_pending(lockres, lockres->l_pending_gen,  osb);
+
 	wake_up(&lockres->l_event);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 }
@@ -838,6 +959,7 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 {
 	int ret = 0;
 	unsigned long flags;
+	unsigned int gen;
 
 	mlog_entry_void();
 
@@ -854,6 +976,7 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 	lockres->l_action = OCFS2_AST_ATTACH;
 	lockres->l_requested = level;
 	lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
+	gen = lockres_set_pending(lockres);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
 	ret = ocfs2_dlm_lock(osb->cconn,
@@ -863,6 +986,7 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 			     lockres->l_name,
 			     OCFS2_LOCK_ID_MAX_LEN - 1,
 			     lockres);
+	lockres_clear_pending(lockres, gen, osb);
 	if (ret) {
 		ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres);
 		ocfs2_recover_from_dlm_error(lockres, 1);
@@ -988,6 +1112,7 @@ static int ocfs2_cluster_lock(struct ocfs2_super *osb,
 	int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR);
 	int ret = 0; /* gcc doesn't realize wait = 1 guarantees ret is set */
 	unsigned long flags;
+	unsigned int gen;
 
 	mlog_entry_void();
 
@@ -1046,6 +1171,7 @@ again:
 
 		lockres->l_requested = level;
 		lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
+		gen = lockres_set_pending(lockres);
 		spin_unlock_irqrestore(&lockres->l_lock, flags);
 
 		BUG_ON(level == DLM_LOCK_IV);
@@ -1062,6 +1188,7 @@ again:
 				     lockres->l_name,
 				     OCFS2_LOCK_ID_MAX_LEN - 1,
 				     lockres);
+		lockres_clear_pending(lockres, gen, osb);
 		if (ret) {
 			if (!(lkm_flags & DLM_LKF_NOQUEUE) ||
 			    (ret != -EAGAIN)) {
@@ -1506,6 +1633,7 @@ out:
 void ocfs2_file_unlock(struct file *file)
 {
 	int ret;
+	unsigned int gen;
 	unsigned long flags;
 	struct ocfs2_file_private *fp = file->private_data;
 	struct ocfs2_lock_res *lockres = &fp->fp_flock;
@@ -1531,11 +1659,11 @@ void ocfs2_file_unlock(struct file *file)
 	lockres_or_flags(lockres, OCFS2_LOCK_BLOCKED);
 	lockres->l_blocking = DLM_LOCK_EX;
 
-	ocfs2_prepare_downconvert(lockres, LKM_NLMODE);
+	gen = ocfs2_prepare_downconvert(lockres, LKM_NLMODE);
 	lockres_add_mask_waiter(lockres, &mw, OCFS2_LOCK_BUSY, 0);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
-	ret = ocfs2_downconvert_lock(osb, lockres, LKM_NLMODE, 0);
+	ret = ocfs2_downconvert_lock(osb, lockres, LKM_NLMODE, 0, gen);
 	if (ret) {
 		mlog_errno(ret);
 		return;
@@ -2555,23 +2683,7 @@ static void ocfs2_unlock_ast(void *opaque, int error)
 	     lockres->l_unlock_action);
 
 	spin_lock_irqsave(&lockres->l_lock, flags);
-	/* We tried to cancel a convert request, but it was already
-	 * granted. All we want to do here is clear our unlock
-	 * state. The wake_up call done at the bottom is redundant
-	 * (ocfs2_prepare_cancel_convert doesn't sleep on this) but doesn't
-	 * hurt anything anyway */
-	if (error == -DLM_ECANCEL &&
-	    lockres->l_unlock_action == OCFS2_UNLOCK_CANCEL_CONVERT) {
-		mlog(0, "Got cancelgrant for %s\n", lockres->l_name);
-
-		/* We don't clear the busy flag in this case as it
-		 * should have been cleared by the ast which the dlm
-		 * has called. */
-		goto complete_unlock;
-	}
-
-	/* DLM_EUNLOCK is the success code for unlock */
-	if (error != -DLM_EUNLOCK) {
+	if (error) {
 		mlog(ML_ERROR, "Dlm passes error %d for lock %s, "
 		     "unlock_action %d\n", error, lockres->l_name,
 		     lockres->l_unlock_action);
@@ -2592,7 +2704,6 @@ static void ocfs2_unlock_ast(void *opaque, int error)
 	}
 
 	lockres_clear_flags(lockres, OCFS2_LOCK_BUSY);
-complete_unlock:
 	lockres->l_unlock_action = OCFS2_UNLOCK_INVALID;
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
 
@@ -2768,8 +2879,8 @@ int ocfs2_drop_inode_locks(struct inode *inode)
 	return status;
 }
 
-static void ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
-				      int new_level)
+static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
+					      int new_level)
 {
 	assert_spin_locked(&lockres->l_lock);
 
@@ -2787,12 +2898,14 @@ static void ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
 	lockres->l_action = OCFS2_AST_DOWNCONVERT;
 	lockres->l_requested = new_level;
 	lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
+	return lockres_set_pending(lockres);
 }
 
 static int ocfs2_downconvert_lock(struct ocfs2_super *osb,
 				  struct ocfs2_lock_res *lockres,
 				  int new_level,
-				  int lvb)
+				  int lvb,
+				  unsigned int generation)
 {
 	int ret;
 	u32 dlm_flags = DLM_LKF_CONVERT;
@@ -2809,6 +2922,7 @@ static int ocfs2_downconvert_lock(struct ocfs2_super *osb,
 			     lockres->l_name,
 			     OCFS2_LOCK_ID_MAX_LEN - 1,
 			     lockres);
+	lockres_clear_pending(lockres, generation, osb);
 	if (ret) {
 		ocfs2_log_dlm_error("ocfs2_dlm_lock", ret, lockres);
 		ocfs2_recover_from_dlm_error(lockres, 1);
@@ -2883,6 +2997,7 @@ static int ocfs2_unblock_lock(struct ocfs2_super *osb,
 	int new_level;
 	int ret = 0;
 	int set_lvb = 0;
+	unsigned int gen;
 
 	mlog_entry_void();
 
@@ -2892,6 +3007,32 @@ static int ocfs2_unblock_lock(struct ocfs2_super *osb,
 
 recheck:
 	if (lockres->l_flags & OCFS2_LOCK_BUSY) {
+		/* XXX
+		 * This is a *big* race.  The OCFS2_LOCK_PENDING flag
+		 * exists entirely for one reason - another thread has set
+		 * OCFS2_LOCK_BUSY, but has *NOT* yet called dlm_lock().
+		 *
+		 * If we do ocfs2_cancel_convert() before the other thread
+		 * calls dlm_lock(), our cancel will do nothing.  We will
+		 * get no ast, and we will have no way of knowing the
+		 * cancel failed.  Meanwhile, the other thread will call
+		 * into dlm_lock() and wait...forever.
+		 *
+		 * Why forever?  Because another node has asked for the
+		 * lock first; that's why we're here in unblock_lock().
+		 *
+		 * The solution is OCFS2_LOCK_PENDING.  When PENDING is
+		 * set, we just requeue the unblock.  Only when the other
+		 * thread has called dlm_lock() and cleared PENDING will
+		 * we then cancel their request.
+		 *
+		 * All callers of dlm_lock() must set OCFS2_DLM_PENDING
+		 * at the same time they set OCFS2_DLM_BUSY.  They must
+		 * clear OCFS2_DLM_PENDING after dlm_lock() returns.
+		 */
+		if (lockres->l_flags & OCFS2_LOCK_PENDING)
+			goto leave_requeue;
+
 		ctl->requeue = 1;
 		ret = ocfs2_prepare_cancel_convert(osb, lockres);
 		spin_unlock_irqrestore(&lockres->l_lock, flags);
@@ -2971,9 +3112,11 @@ downconvert:
 			lockres->l_ops->set_lvb(lockres);
 	}
 
-	ocfs2_prepare_downconvert(lockres, new_level);
+	gen = ocfs2_prepare_downconvert(lockres, new_level);
 	spin_unlock_irqrestore(&lockres->l_lock, flags);
-	ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb);
+	ret = ocfs2_downconvert_lock(osb, lockres, new_level, set_lvb,
+				     gen);
+
 leave:
 	mlog_exit(ret);
 	return ret;
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 31dc28b..af929ec 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -98,6 +98,9 @@ enum ocfs2_unlock_action {
 					       * dropped. */
 #define OCFS2_LOCK_QUEUED        (0x00000100) /* queued for downconvert */
 #define OCFS2_LOCK_NOCACHE       (0x00000200) /* don't use a holder count */
+#define OCFS2_LOCK_PENDING       (0x00000400) /* This lockres is pending a
+						 call to dlm_lock.  Only
+						 exists with BUSY set. */
 
 struct ocfs2_lock_res_ops;
 
@@ -124,6 +127,7 @@ struct ocfs2_lock_res {
 	enum ocfs2_unlock_action l_unlock_action;
 	int                      l_requested;
 	int                      l_blocking;
+	unsigned int             l_pending_gen;
 
 	wait_queue_head_t        l_event;
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 670fa94..abdb9f6 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -104,8 +104,8 @@ static int flags_to_o2dlm(u32 flags)
  *
  * DLM_NORMAL:		0
  * DLM_NOTQUEUED:	-EAGAIN
- * DLM_CANCELGRANT:	-DLM_ECANCEL
- * DLM_CANCEL:		-DLM_EUNLOCK
+ * DLM_CANCELGRANT:	-EBUSY
+ * DLM_CANCEL:		-DLM_ECANCEL
  */
 /* Keep in sync with dlmapi.h */
 static int status_map[] = {
@@ -113,13 +113,13 @@ static int status_map[] = {
 	[DLM_GRANTED]			= -EINVAL,
 	[DLM_DENIED]			= -EACCES,
 	[DLM_DENIED_NOLOCKS]		= -EACCES,
-	[DLM_WORKING]			= -EBUSY,
+	[DLM_WORKING]			= -EACCES,
 	[DLM_BLOCKED]			= -EINVAL,
 	[DLM_BLOCKED_ORPHAN]		= -EINVAL,
 	[DLM_DENIED_GRACE_PERIOD]	= -EACCES,
 	[DLM_SYSERR]			= -ENOMEM,	/* It is what it is */
 	[DLM_NOSUPPORT]			= -EPROTO,
-	[DLM_CANCELGRANT]		= -DLM_ECANCEL, /* Cancel after grant */
+	[DLM_CANCELGRANT]		= -EBUSY,	/* Cancel after grant */
 	[DLM_IVLOCKID]			= -EINVAL,
 	[DLM_SYNC]			= -EINVAL,
 	[DLM_BADTYPE]			= -EINVAL,
@@ -137,7 +137,7 @@ static int status_map[] = {
 	[DLM_VALNOTVALID]		= -EINVAL,
 	[DLM_REJECTED]			= -EPERM,
 	[DLM_ABORT]			= -EINVAL,
-	[DLM_CANCEL]			= -DLM_EUNLOCK,	/* Successful cancel */
+	[DLM_CANCEL]			= -DLM_ECANCEL,	/* Successful cancel */
 	[DLM_IVRESHANDLE]		= -EINVAL,
 	[DLM_DEADLOCK]			= -EDEADLK,
 	[DLM_DENIED_NOASTS]		= -EINVAL,
@@ -152,6 +152,7 @@ static int status_map[] = {
 	[DLM_MIGRATING]			= -ERESTART,
 	[DLM_MAXSTATS]			= -EINVAL,
 };
+
 static int dlm_status_to_errno(enum dlm_status status)
 {
 	BUG_ON(status > (sizeof(status_map) / sizeof(status_map[0])));
@@ -175,38 +176,23 @@ static void o2dlm_blocking_ast_wrapper(void *astarg, int level)
 
 static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
 {
-	int error;
+	int error = dlm_status_to_errno(status);
 
 	BUG_ON(lproto == NULL);
 
 	/*
-	 * XXX: CANCEL values are sketchy.
-	 *
-	 * Currently we have preserved the o2dlm paradigm.  You can get
-	 * unlock_ast() whether the cancel succeded or not.
-	 *
-	 * First, we're going to pass DLM_EUNLOCK just like fs/dlm does for
-	 * successful unlocks.  That is a clean behavior.
-	 *
 	 * In o2dlm, you can get both the lock_ast() for the lock being
 	 * granted and the unlock_ast() for the CANCEL failing.  A
 	 * successful cancel sends DLM_NORMAL here.  If the
 	 * lock grant happened before the cancel arrived, you get
-	 * DLM_CANCELGRANT.  For now, we'll use DLM_ECANCEL to signify
-	 * CANCELGRANT - the CANCEL was supposed to happen but didn't.  We
-	 * can then use DLM_EUNLOCK to signify a successful CANCEL -
-	 * effectively, the CANCEL caused the lock to roll back.
+	 * DLM_CANCELGRANT.
 	 *
-	 * In the future, we will likely move the o2dlm to send only one
-	 * ast - either unlock_ast() for a successful CANCEL or lock_ast()
-	 * when the grant succeeds.  At that point, we'll send DLM_ECANCEL
-	 * for all cancel results (CANCELGRANT will no longer exist).
+	 * There's no need for the double-ast.  If we see DLM_CANCELGRANT,
+	 * we just ignore it.  We expect the lock_ast() to handle the
+	 * granted lock.
 	 */
-	error = dlm_status_to_errno(status);
-
-	/* Successful unlock is DLM_EUNLOCK */
-	if (!error)
-		error = -DLM_EUNLOCK;
+	if (status == DLM_CANCELGRANT)
+		return;
 
 	lproto->lp_unlock_ast(astarg, error);
 }
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 17/62] ocfs2: handle async EAGAIN from NOQUEUE request
  2008-04-02 20:14                               ` [Ocfs2-devel] [PATCH 16/62] ocfs2: Remove CANCELGRANT from the view of dlmglue Mark Fasheh
@ 2008-04-02 20:14                                 ` Mark Fasheh
  2008-04-02 20:14                                   ` [Ocfs2-devel] [PATCH 18/62] ocfs2: Abstract out a debugging function for underlying dlms Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, David Teigland

From: David Teigland <teigland@redhat.com>

When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE
requests. Fix up dlmglue to expect this.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c |   27 +++++++++++++++++++++++----
 1 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 295c47f..b640423 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -880,13 +880,20 @@ static void ocfs2_locking_ast(void *opaque)
 	struct ocfs2_lock_res *lockres = opaque;
 	struct ocfs2_super *osb = ocfs2_get_lockres_osb(lockres);
 	unsigned long flags;
+	int status;
 
 	spin_lock_irqsave(&lockres->l_lock, flags);
 
-	if (ocfs2_dlm_lock_status(&lockres->l_lksb)) {
+	status = ocfs2_dlm_lock_status(&lockres->l_lksb);
+
+	if (status == -EAGAIN) {
+		lockres_clear_flags(lockres, OCFS2_LOCK_BUSY);
+		goto out;
+	}
+
+	if (status) {
 		mlog(ML_ERROR, "lockres %s: lksb status value of %d!\n",
-		     lockres->l_name,
-		     ocfs2_dlm_lock_status(&lockres->l_lksb));
+		     lockres->l_name, status);
 		spin_unlock_irqrestore(&lockres->l_lock, flags);
 		return;
 	}
@@ -909,7 +916,7 @@ static void ocfs2_locking_ast(void *opaque)
 		     lockres->l_unlock_action);
 		BUG();
 	}
-
+out:
 	/* set it to something invalid so if we get called again we
 	 * can catch it. */
 	lockres->l_action = OCFS2_AST_INVALID;
@@ -1113,6 +1120,7 @@ static int ocfs2_cluster_lock(struct ocfs2_super *osb,
 	int ret = 0; /* gcc doesn't realize wait = 1 guarantees ret is set */
 	unsigned long flags;
 	unsigned int gen;
+	int noqueue_attempted = 0;
 
 	mlog_entry_void();
 
@@ -1157,6 +1165,13 @@ again:
 	}
 
 	if (level > lockres->l_level) {
+		if (noqueue_attempted > 0) {
+			ret = -EAGAIN;
+			goto unlock;
+		}
+		if (lkm_flags & DLM_LKF_NOQUEUE)
+			noqueue_attempted = 1;
+
 		if (lockres->l_action != OCFS2_AST_INVALID)
 			mlog(ML_ERROR, "lockres %s has action %u pending\n",
 			     lockres->l_name, lockres->l_action);
@@ -1621,6 +1636,10 @@ int ocfs2_file_lock(struct file *file, int ex, int trylock)
 		 * to just bubble sucess back up to the user.
 		 */
 		ret = ocfs2_flock_handle_signal(lockres, level);
+	} else if (!ret && (level > lockres->l_level)) {
+		/* Trylock failed asynchronously */
+		BUG_ON(!trylock);
+		ret = -EAGAIN;
 	}
 
 out:
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 18/62] ocfs2: Abstract out a debugging function for underlying dlms.
  2008-04-02 20:14                                 ` [Ocfs2-devel] [PATCH 17/62] ocfs2: handle async EAGAIN from NOQUEUE request Mark Fasheh
@ 2008-04-02 20:14                                   ` Mark Fasheh
  2008-04-02 20:14                                     ` [Ocfs2-devel] [PATCH 19/62] ocfs2: Clean up stackglue initialization Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

dlmglue.c was still referencing a raw o2dlm lksb in one instance.  Let's
create a generic ocfs2_dlm_dump_lksb() function.  This allows underlying
DLMs to print whatever they want about their lock.

We then move the o2dlm dump into stackglue.c where it belongs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |    3 +--
 fs/ocfs2/stackglue.c |    5 +++++
 fs/ocfs2/stackglue.h |    1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index b640423..f41ff1c 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2803,8 +2803,7 @@ static int ocfs2_drop_lock(struct ocfs2_super *osb,
 	if (ret) {
 		ocfs2_log_dlm_error("ocfs2_dlm_unlock", ret, lockres);
 		mlog(ML_ERROR, "lockres flags: %lu\n", lockres->l_flags);
-		/* XXX Need to abstract this */
-		dlm_print_one_lock(lockres->l_lksb.lksb_o2dlm.lockid);
+		ocfs2_dlm_dump_lksb(&lockres->l_lksb);
 		BUG();
 	}
 	mlog(0, "lock %s, successfull return from ocfs2_dlm_unlock\n",
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index abdb9f6..bd80541 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -252,6 +252,11 @@ void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb)
 	return (void *)(lksb->lksb_o2dlm.lvb);
 }
 
+void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
+{
+	dlm_print_one_lock(lksb->lksb_o2dlm.lockid);
+}
+
 /*
  * Called from the dlm when it's about to evict a node. This is how the
  * classic stack signals node death.
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 22af77b..01e3c9b 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -91,6 +91,7 @@ int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
 
 int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb);
 void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb);
+void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb);
 
 void o2cb_get_stack(struct ocfs2_locking_protocol *proto);
 void o2cb_put_stack(void);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 19/62] ocfs2: Clean up stackglue initialization
  2008-04-02 20:14                                   ` [Ocfs2-devel] [PATCH 18/62] ocfs2: Abstract out a debugging function for underlying dlms Mark Fasheh
@ 2008-04-02 20:14                                     ` Mark Fasheh
  2008-04-02 20:14                                       ` [Ocfs2-devel] [PATCH 20/62] ocfs2: Split o2cb code from generic stack functions Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The stack glue initialization function needs a better name so that it can be
used cleanly when stackglue becomes a module.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |    9 ++-------
 fs/ocfs2/dlmglue.h   |    5 ++---
 fs/ocfs2/stackglue.c |    8 ++------
 fs/ocfs2/stackglue.h |    3 +--
 fs/ocfs2/super.c     |    6 ++----
 5 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index f41ff1c..8a9c849 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3366,16 +3366,11 @@ static struct ocfs2_locking_protocol lproto = {
 	.lp_unlock_ast		= ocfs2_unlock_ast,
 };
 
-/* This interface isn't the final one, hence the less-than-perfect names */
-void dlmglue_init_stack(void)
+void ocfs2_set_locking_protocol(void)
 {
-	o2cb_get_stack(&lproto);
+	ocfs2_stack_glue_set_locking_protocol(&lproto);
 }
 
-void dlmglue_exit_stack(void)
-{
-	o2cb_put_stack();
-}
 
 static void ocfs2_process_blocked_lock(struct ocfs2_super *osb,
 				       struct ocfs2_lock_res *lockres)
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 2d0a8a0..34b7598 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -114,7 +114,6 @@ void ocfs2_wake_downconvert_thread(struct ocfs2_super *osb);
 struct ocfs2_dlm_debug *ocfs2_new_dlm_debug(void);
 void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug *dlm_debug);
 
-void dlmglue_init_stack(void);
-void dlmglue_exit_stack(void);
-
+/* To set the locking protocol on module initialization */
+void ocfs2_set_locking_protocol(void);
 #endif	/* DLMGLUE_H */
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index bd80541..51c2546 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -429,14 +429,10 @@ int ocfs2_cluster_this_node(unsigned int *node)
 	return 0;
 }
 
-void o2cb_get_stack(struct ocfs2_locking_protocol *proto)
+void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto)
 {
-	BUG_ON(proto == NULL);
+	BUG_ON(proto != NULL);
 
 	lproto = proto;
 }
 
-void o2cb_put_stack(void)
-{
-	lproto = NULL;
-}
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 01e3c9b..decb147 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -93,7 +93,6 @@ int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb);
 void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb);
 void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb);
 
-void o2cb_get_stack(struct ocfs2_locking_protocol *proto);
-void o2cb_put_stack(void);
+void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto);
 
 #endif  /* STACKGLUE_H */
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index fa9c46e..b4a02a0 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -922,8 +922,6 @@ static int __init ocfs2_init(void)
 
 	ocfs2_print_version();
 
-	dlmglue_init_stack();
-
 	status = init_ocfs2_uptodate_cache();
 	if (status < 0) {
 		mlog_errno(status);
@@ -948,6 +946,8 @@ static int __init ocfs2_init(void)
 		mlog(ML_ERROR, "Unable to create ocfs2 debugfs root.\n");
 	}
 
+	ocfs2_set_locking_protocol();
+
 leave:
 	if (status < 0) {
 		ocfs2_free_mem_caches();
@@ -979,8 +979,6 @@ static void __exit ocfs2_exit(void)
 
 	exit_ocfs2_uptodate_cache();
 
-	dlmglue_exit_stack();
-
 	mlog_exit_void();
 }
 
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 20/62] ocfs2: Split o2cb code from generic stack functions.
  2008-04-02 20:14                                     ` [Ocfs2-devel] [PATCH 19/62] ocfs2: Clean up stackglue initialization Mark Fasheh
@ 2008-04-02 20:14                                       ` Mark Fasheh
  2008-04-02 20:14                                         ` [Ocfs2-devel] [PATCH 21/62] ocfs2: Create ocfs2_stack_operations and split out the o2cb stack Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Split off the o2cb-specific funtionality from the generic stack glue
calls.  This is a precurser to wrapping the o2cb functionality in an
operations vector.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stackglue.c |  209 ++++++++++++++++++++++++++++++++++----------------
 1 files changed, 144 insertions(+), 65 deletions(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 51c2546..e35dde6 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -197,21 +197,19 @@ static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
 	lproto->lp_unlock_ast(astarg, error);
 }
 
-int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
-		   int mode,
-		   union ocfs2_dlm_lksb *lksb,
-		   u32 flags,
-		   void *name,
-		   unsigned int namelen,
-		   void *astarg)
+static int o2cb_dlm_lock(struct ocfs2_cluster_connection *conn,
+			 int mode,
+			 union ocfs2_dlm_lksb *lksb,
+			 u32 flags,
+			 void *name,
+			 unsigned int namelen,
+			 void *astarg)
 {
 	enum dlm_status status;
 	int o2dlm_mode = mode_to_o2dlm(mode);
 	int o2dlm_flags = flags_to_o2dlm(flags);
 	int ret;
 
-	BUG_ON(lproto == NULL);
-
 	status = dlmlock(conn->cc_lockspace, o2dlm_mode, &lksb->lksb_o2dlm,
 			 o2dlm_flags, name, namelen,
 			 o2dlm_lock_ast_wrapper, astarg,
@@ -220,43 +218,80 @@ int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 	return ret;
 }
 
-int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
-		     union ocfs2_dlm_lksb *lksb,
-		     u32 flags,
-		     void *astarg)
+int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
+		   int mode,
+		   union ocfs2_dlm_lksb *lksb,
+		   u32 flags,
+		   void *name,
+		   unsigned int namelen,
+		   void *astarg)
+{
+	BUG_ON(lproto == NULL);
+
+	return o2cb_dlm_lock(conn, mode, lksb, flags,
+			     name, namelen, astarg);
+}
+
+static int o2cb_dlm_unlock(struct ocfs2_cluster_connection *conn,
+			   union ocfs2_dlm_lksb *lksb,
+			   u32 flags,
+			   void *astarg)
 {
 	enum dlm_status status;
 	int o2dlm_flags = flags_to_o2dlm(flags);
 	int ret;
 
-	BUG_ON(lproto == NULL);
-
 	status = dlmunlock(conn->cc_lockspace, &lksb->lksb_o2dlm,
 			   o2dlm_flags, o2dlm_unlock_ast_wrapper, astarg);
 	ret = dlm_status_to_errno(status);
 	return ret;
 }
 
-int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
+int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
+		     union ocfs2_dlm_lksb *lksb,
+		     u32 flags,
+		     void *astarg)
+{
+	BUG_ON(lproto == NULL);
+
+	return o2cb_dlm_unlock(conn, lksb, flags, astarg);
+}
+
+static int o2cb_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
 {
 	return dlm_status_to_errno(lksb->lksb_o2dlm.status);
 }
 
+int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
+{
+	return o2cb_dlm_lock_status(lksb);
+}
+
 /*
  * Why don't we cast to ocfs2_meta_lvb?  The "clean" answer is that we
  * don't cast at the glue level.  The real answer is that the header
  * ordering is nigh impossible.
  */
-void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb)
+static void *o2cb_dlm_lvb(union ocfs2_dlm_lksb *lksb)
 {
 	return (void *)(lksb->lksb_o2dlm.lvb);
 }
 
-void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
+void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb)
+{
+	return o2cb_dlm_lvb(lksb);
+}
+
+static void o2cb_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
 {
 	dlm_print_one_lock(lksb->lksb_o2dlm.lockid);
 }
 
+void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
+{
+	o2cb_dlm_dump_lksb(lksb);
+}
+
 /*
  * Called from the dlm when it's about to evict a node. This is how the
  * classic stack signals node death.
@@ -271,6 +306,62 @@ static void o2dlm_eviction_cb(int node_num, void *data)
 	conn->cc_recovery_handler(node_num, conn->cc_recovery_data);
 }
 
+static int o2cb_cluster_connect(struct ocfs2_cluster_connection *conn)
+{
+	int rc = 0;
+	u32 dlm_key;
+	struct dlm_ctxt *dlm;
+	struct o2dlm_private *priv;
+	struct dlm_protocol_version dlm_version;
+
+	BUG_ON(conn == NULL);
+
+	/* for now we only have one cluster/node, make sure we see it
+	 * in the heartbeat universe */
+	if (!o2hb_check_local_node_heartbeating()) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	priv = kzalloc(sizeof(struct o2dlm_private), GFP_KERNEL);
+	if (!priv) {
+		rc = -ENOMEM;
+		goto out_free;
+	}
+
+	/* This just fills the structure in.  It is safe to pass conn. */
+	dlm_setup_eviction_cb(&priv->op_eviction_cb, o2dlm_eviction_cb,
+			      conn);
+
+	conn->cc_private = priv;
+
+	/* used by the dlm code to make message headers unique, each
+	 * node in this domain must agree on this. */
+	dlm_key = crc32_le(0, conn->cc_name, conn->cc_namelen);
+	dlm_version.pv_major = conn->cc_version.pv_major;
+	dlm_version.pv_minor = conn->cc_version.pv_minor;
+
+	dlm = dlm_register_domain(conn->cc_name, dlm_key, &dlm_version);
+	if (IS_ERR(dlm)) {
+		rc = PTR_ERR(dlm);
+		mlog_errno(rc);
+		goto out_free;
+	}
+
+	conn->cc_version.pv_major = dlm_version.pv_major;
+	conn->cc_version.pv_minor = dlm_version.pv_minor;
+	conn->cc_lockspace = dlm;
+
+	dlm_register_eviction_cb(dlm, &priv->op_eviction_cb);
+
+out_free:
+	if (rc && conn->cc_private)
+		kfree(conn->cc_private);
+
+out:
+	return rc;
+}
+
 int ocfs2_cluster_connect(const char *group,
 			  int grouplen,
 			  void (*recovery_handler)(int node_num,
@@ -280,10 +371,6 @@ int ocfs2_cluster_connect(const char *group,
 {
 	int rc = 0;
 	struct ocfs2_cluster_connection *new_conn;
-	u32 dlm_key;
-	struct dlm_ctxt *dlm;
-	struct o2dlm_private *priv;
-	struct dlm_protocol_version dlm_version;
 
 	BUG_ON(group == NULL);
 	BUG_ON(conn == NULL);
@@ -294,13 +381,6 @@ int ocfs2_cluster_connect(const char *group,
 		goto out;
 	}
 
-	/* for now we only have one cluster/node, make sure we see it
-	 * in the heartbeat universe */
-	if (!o2hb_check_local_node_heartbeating()) {
-		rc = -EINVAL;
-		goto out;
-	}
-
 	new_conn = kzalloc(sizeof(struct ocfs2_cluster_connection),
 			   GFP_KERNEL);
 	if (!new_conn) {
@@ -316,64 +396,53 @@ int ocfs2_cluster_connect(const char *group,
 	/* Start the new connection at our maximum compatibility level */
 	new_conn->cc_version = lproto->lp_max_version;
 
-	priv = kzalloc(sizeof(struct o2dlm_private), GFP_KERNEL);
-	if (!priv) {
-		rc = -ENOMEM;
-		goto out_free;
-	}
-
-	/* This just fills the structure in.  It is safe to use new_conn. */
-	dlm_setup_eviction_cb(&priv->op_eviction_cb, o2dlm_eviction_cb,
-			      new_conn);
-
-	new_conn->cc_private = priv;
-
-	/* used by the dlm code to make message headers unique, each
-	 * node in this domain must agree on this. */
-	dlm_key = crc32_le(0, group, grouplen);
-	dlm_version.pv_major = new_conn->cc_version.pv_major;
-	dlm_version.pv_minor = new_conn->cc_version.pv_minor;
-
-	dlm = dlm_register_domain(group, dlm_key, &dlm_version);
-	if (IS_ERR(dlm)) {
-		rc = PTR_ERR(dlm);
+	rc = o2cb_cluster_connect(new_conn);
+	if (rc) {
 		mlog_errno(rc);
 		goto out_free;
 	}
 
-	new_conn->cc_version.pv_major = dlm_version.pv_major;
-	new_conn->cc_version.pv_minor = dlm_version.pv_minor;
-	new_conn->cc_lockspace = dlm;
-
-	dlm_register_eviction_cb(dlm, &priv->op_eviction_cb);
-
 	*conn = new_conn;
 
 out_free:
-	if (rc) {
-		kfree(new_conn->cc_private);
+	if (rc)
 		kfree(new_conn);
-	}
 
 out:
 	return rc;
 }
 
 
-int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
+static int o2cb_cluster_disconnect(struct ocfs2_cluster_connection *conn)
 {
 	struct dlm_ctxt *dlm = conn->cc_lockspace;
 	struct o2dlm_private *priv = conn->cc_private;
 
 	dlm_unregister_eviction_cb(&priv->op_eviction_cb);
-	dlm_unregister_domain(dlm);
-
+	conn->cc_private = NULL;
 	kfree(priv);
-	kfree(conn);
+
+	dlm_unregister_domain(dlm);
+	conn->cc_lockspace = NULL;
 
 	return 0;
 }
 
+int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
+{
+	int ret;
+
+	BUG_ON(conn == NULL);
+
+	ret = o2cb_cluster_disconnect(conn);
+
+	/* XXX Should we free it anyway? */
+	if (!ret)
+		kfree(conn);
+
+	return ret;
+}
+
 static void o2hb_stop(const char *group)
 {
 	int ret;
@@ -406,15 +475,20 @@ static void o2hb_stop(const char *group)
  *
  * Other stacks will eventually provide a NULL ->hangup() pointer.
  */
+static void o2cb_cluster_hangup(const char *group, int grouplen)
+{
+	o2hb_stop(group);
+}
+
 void ocfs2_cluster_hangup(const char *group, int grouplen)
 {
 	BUG_ON(group == NULL);
 	BUG_ON(group[grouplen] != '\0');
 
-	o2hb_stop(group);
+	o2cb_cluster_hangup(group, grouplen);
 }
 
-int ocfs2_cluster_this_node(unsigned int *node)
+static int o2cb_cluster_this_node(unsigned int *node)
 {
 	int node_num;
 
@@ -429,6 +503,11 @@ int ocfs2_cluster_this_node(unsigned int *node)
 	return 0;
 }
 
+int ocfs2_cluster_this_node(unsigned int *node)
+{
+	return o2cb_cluster_this_node(node);
+}
+
 void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto)
 {
 	BUG_ON(proto != NULL);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 21/62] ocfs2: Create ocfs2_stack_operations and split out the o2cb stack.
  2008-04-02 20:14                                       ` [Ocfs2-devel] [PATCH 20/62] ocfs2: Split o2cb code from generic stack functions Mark Fasheh
@ 2008-04-02 20:14                                         ` Mark Fasheh
  2008-04-02 20:14                                           ` [Ocfs2-devel] [PATCH 22/62] ocfs2: Break out stackglue into modules Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Define the ocfs2_stack_operations structure.  Build o2cb_stack_ops from
all of the o2cb-specific stack functions.  Change the generic stack glue
functions to call the stack_ops instead of the o2cb functions directly.

The o2cb functions are moved to stack_o2cb.c.  The headers are cleaned up
to where only needed headers are included.

In this code, stackglue.c and stack_o2cb.c refer to some shared
extern variables.  When they become modules, that will change.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/Makefile     |    1 +
 fs/ocfs2/stack_o2cb.c |  395 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/stackglue.c  |  385 ++---------------------------------------------
 fs/ocfs2/stackglue.h  |  123 +++++++++++++++-
 4 files changed, 532 insertions(+), 372 deletions(-)
 create mode 100644 fs/ocfs2/stack_o2cb.c

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index 3ba64af..8e86195 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -25,6 +25,7 @@ ocfs2-objs := \
 	resize.o		\
 	slot_map.o 		\
 	stackglue.o		\
+	stack_o2cb.o		\
 	suballoc.o 		\
 	super.o 		\
 	symlink.o 		\
diff --git a/fs/ocfs2/stack_o2cb.c b/fs/ocfs2/stack_o2cb.c
new file mode 100644
index 0000000..c9bc354
--- /dev/null
+++ b/fs/ocfs2/stack_o2cb.c
@@ -0,0 +1,395 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * stack_o2cb.c
+ *
+ * Code which interfaces ocfs2 with the o2cb stack.
+ *
+ * Copyright (C) 2007 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/crc32.h>
+#include <linux/kmod.h>
+
+/* Needed for AOP_TRUNCATED_PAGE in mlog_errno() */
+#include <linux/fs.h>
+
+#include "cluster/masklog.h"
+#include "cluster/nodemanager.h"
+#include "cluster/heartbeat.h"
+
+#include "stackglue.h"
+
+struct o2dlm_private {
+	struct dlm_eviction_cb op_eviction_cb;
+};
+
+/* These should be identical */
+#if (DLM_LOCK_IV != LKM_IVMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_NL != LKM_NLMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_CR != LKM_CRMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_CW != LKM_CWMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_PR != LKM_PRMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_PW != LKM_PWMODE)
+# error Lock modes do not match
+#endif
+#if (DLM_LOCK_EX != LKM_EXMODE)
+# error Lock modes do not match
+#endif
+static inline int mode_to_o2dlm(int mode)
+{
+	BUG_ON(mode > LKM_MAXMODE);
+
+	return mode;
+}
+
+#define map_flag(_generic, _o2dlm)		\
+	if (flags & (_generic)) {		\
+		flags &= ~(_generic);		\
+		o2dlm_flags |= (_o2dlm);	\
+	}
+static int flags_to_o2dlm(u32 flags)
+{
+	int o2dlm_flags = 0;
+
+	map_flag(DLM_LKF_NOQUEUE, LKM_NOQUEUE);
+	map_flag(DLM_LKF_CANCEL, LKM_CANCEL);
+	map_flag(DLM_LKF_CONVERT, LKM_CONVERT);
+	map_flag(DLM_LKF_VALBLK, LKM_VALBLK);
+	map_flag(DLM_LKF_IVVALBLK, LKM_INVVALBLK);
+	map_flag(DLM_LKF_ORPHAN, LKM_ORPHAN);
+	map_flag(DLM_LKF_FORCEUNLOCK, LKM_FORCE);
+	map_flag(DLM_LKF_TIMEOUT, LKM_TIMEOUT);
+	map_flag(DLM_LKF_LOCAL, LKM_LOCAL);
+
+	/* map_flag() should have cleared every flag passed in */
+	BUG_ON(flags != 0);
+
+	return o2dlm_flags;
+}
+#undef map_flag
+
+/*
+ * Map an o2dlm status to standard errno values.
+ *
+ * o2dlm only uses a handful of these, and returns even fewer to the
+ * caller. Still, we try to assign sane values to each error.
+ *
+ * The following value pairs have special meanings to dlmglue, thus
+ * the right hand side needs to stay unique - never duplicate the
+ * mapping elsewhere in the table!
+ *
+ * DLM_NORMAL:		0
+ * DLM_NOTQUEUED:	-EAGAIN
+ * DLM_CANCELGRANT:	-EBUSY
+ * DLM_CANCEL:		-DLM_ECANCEL
+ */
+/* Keep in sync with dlmapi.h */
+static int status_map[] = {
+	[DLM_NORMAL]			= 0,		/* Success */
+	[DLM_GRANTED]			= -EINVAL,
+	[DLM_DENIED]			= -EACCES,
+	[DLM_DENIED_NOLOCKS]		= -EACCES,
+	[DLM_WORKING]			= -EACCES,
+	[DLM_BLOCKED]			= -EINVAL,
+	[DLM_BLOCKED_ORPHAN]		= -EINVAL,
+	[DLM_DENIED_GRACE_PERIOD]	= -EACCES,
+	[DLM_SYSERR]			= -ENOMEM,	/* It is what it is */
+	[DLM_NOSUPPORT]			= -EPROTO,
+	[DLM_CANCELGRANT]		= -EBUSY,	/* Cancel after grant */
+	[DLM_IVLOCKID]			= -EINVAL,
+	[DLM_SYNC]			= -EINVAL,
+	[DLM_BADTYPE]			= -EINVAL,
+	[DLM_BADRESOURCE]		= -EINVAL,
+	[DLM_MAXHANDLES]		= -ENOMEM,
+	[DLM_NOCLINFO]			= -EINVAL,
+	[DLM_NOLOCKMGR]			= -EINVAL,
+	[DLM_NOPURGED]			= -EINVAL,
+	[DLM_BADARGS]			= -EINVAL,
+	[DLM_VOID]			= -EINVAL,
+	[DLM_NOTQUEUED]			= -EAGAIN,	/* Trylock failed */
+	[DLM_IVBUFLEN]			= -EINVAL,
+	[DLM_CVTUNGRANT]		= -EPERM,
+	[DLM_BADPARAM]			= -EINVAL,
+	[DLM_VALNOTVALID]		= -EINVAL,
+	[DLM_REJECTED]			= -EPERM,
+	[DLM_ABORT]			= -EINVAL,
+	[DLM_CANCEL]			= -DLM_ECANCEL,	/* Successful cancel */
+	[DLM_IVRESHANDLE]		= -EINVAL,
+	[DLM_DEADLOCK]			= -EDEADLK,
+	[DLM_DENIED_NOASTS]		= -EINVAL,
+	[DLM_FORWARD]			= -EINVAL,
+	[DLM_TIMEOUT]			= -ETIMEDOUT,
+	[DLM_IVGROUPID]			= -EINVAL,
+	[DLM_VERS_CONFLICT]		= -EOPNOTSUPP,
+	[DLM_BAD_DEVICE_PATH]		= -ENOENT,
+	[DLM_NO_DEVICE_PERMISSION]	= -EPERM,
+	[DLM_NO_CONTROL_DEVICE]		= -ENOENT,
+	[DLM_RECOVERING]		= -ENOTCONN,
+	[DLM_MIGRATING]			= -ERESTART,
+	[DLM_MAXSTATS]			= -EINVAL,
+};
+
+static int dlm_status_to_errno(enum dlm_status status)
+{
+	BUG_ON(status > (sizeof(status_map) / sizeof(status_map[0])));
+
+	return status_map[status];
+}
+
+static void o2dlm_lock_ast_wrapper(void *astarg)
+{
+	BUG_ON(stack_glue_lproto == NULL);
+
+	stack_glue_lproto->lp_lock_ast(astarg);
+}
+
+static void o2dlm_blocking_ast_wrapper(void *astarg, int level)
+{
+	BUG_ON(stack_glue_lproto == NULL);
+
+	stack_glue_lproto->lp_blocking_ast(astarg, level);
+}
+
+static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
+{
+	int error = dlm_status_to_errno(status);
+
+	BUG_ON(stack_glue_lproto == NULL);
+
+	/*
+	 * In o2dlm, you can get both the lock_ast() for the lock being
+	 * granted and the unlock_ast() for the CANCEL failing.  A
+	 * successful cancel sends DLM_NORMAL here.  If the
+	 * lock grant happened before the cancel arrived, you get
+	 * DLM_CANCELGRANT.
+	 *
+	 * There's no need for the double-ast.  If we see DLM_CANCELGRANT,
+	 * we just ignore it.  We expect the lock_ast() to handle the
+	 * granted lock.
+	 */
+	if (status == DLM_CANCELGRANT)
+		return;
+
+	stack_glue_lproto->lp_unlock_ast(astarg, error);
+}
+
+static int o2cb_dlm_lock(struct ocfs2_cluster_connection *conn,
+			 int mode,
+			 union ocfs2_dlm_lksb *lksb,
+			 u32 flags,
+			 void *name,
+			 unsigned int namelen,
+			 void *astarg)
+{
+	enum dlm_status status;
+	int o2dlm_mode = mode_to_o2dlm(mode);
+	int o2dlm_flags = flags_to_o2dlm(flags);
+	int ret;
+
+	status = dlmlock(conn->cc_lockspace, o2dlm_mode, &lksb->lksb_o2dlm,
+			 o2dlm_flags, name, namelen,
+			 o2dlm_lock_ast_wrapper, astarg,
+			 o2dlm_blocking_ast_wrapper);
+	ret = dlm_status_to_errno(status);
+	return ret;
+}
+
+static int o2cb_dlm_unlock(struct ocfs2_cluster_connection *conn,
+			   union ocfs2_dlm_lksb *lksb,
+			   u32 flags,
+			   void *astarg)
+{
+	enum dlm_status status;
+	int o2dlm_flags = flags_to_o2dlm(flags);
+	int ret;
+
+	status = dlmunlock(conn->cc_lockspace, &lksb->lksb_o2dlm,
+			   o2dlm_flags, o2dlm_unlock_ast_wrapper, astarg);
+	ret = dlm_status_to_errno(status);
+	return ret;
+}
+
+static int o2cb_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
+{
+	return dlm_status_to_errno(lksb->lksb_o2dlm.status);
+}
+
+static void *o2cb_dlm_lvb(union ocfs2_dlm_lksb *lksb)
+{
+	return (void *)(lksb->lksb_o2dlm.lvb);
+}
+
+static void o2cb_dump_lksb(union ocfs2_dlm_lksb *lksb)
+{
+	dlm_print_one_lock(lksb->lksb_o2dlm.lockid);
+}
+
+/*
+ * Called from the dlm when it's about to evict a node. This is how the
+ * classic stack signals node death.
+ */
+static void o2dlm_eviction_cb(int node_num, void *data)
+{
+	struct ocfs2_cluster_connection *conn = data;
+
+	mlog(ML_NOTICE, "o2dlm has evicted node %d from group %.*s\n",
+	     node_num, conn->cc_namelen, conn->cc_name);
+
+	conn->cc_recovery_handler(node_num, conn->cc_recovery_data);
+}
+
+static int o2cb_cluster_connect(struct ocfs2_cluster_connection *conn)
+{
+	int rc = 0;
+	u32 dlm_key;
+	struct dlm_ctxt *dlm;
+	struct o2dlm_private *priv;
+	struct dlm_protocol_version dlm_version;
+
+	BUG_ON(conn == NULL);
+
+	/* for now we only have one cluster/node, make sure we see it
+	 * in the heartbeat universe */
+	if (!o2hb_check_local_node_heartbeating()) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	priv = kzalloc(sizeof(struct o2dlm_private), GFP_KERNEL);
+	if (!priv) {
+		rc = -ENOMEM;
+		goto out_free;
+	}
+
+	/* This just fills the structure in.  It is safe to pass conn. */
+	dlm_setup_eviction_cb(&priv->op_eviction_cb, o2dlm_eviction_cb,
+			      conn);
+
+	conn->cc_private = priv;
+
+	/* used by the dlm code to make message headers unique, each
+	 * node in this domain must agree on this. */
+	dlm_key = crc32_le(0, conn->cc_name, conn->cc_namelen);
+	dlm_version.pv_major = conn->cc_version.pv_major;
+	dlm_version.pv_minor = conn->cc_version.pv_minor;
+
+	dlm = dlm_register_domain(conn->cc_name, dlm_key, &dlm_version);
+	if (IS_ERR(dlm)) {
+		rc = PTR_ERR(dlm);
+		mlog_errno(rc);
+		goto out_free;
+	}
+
+	conn->cc_version.pv_major = dlm_version.pv_major;
+	conn->cc_version.pv_minor = dlm_version.pv_minor;
+	conn->cc_lockspace = dlm;
+
+	dlm_register_eviction_cb(dlm, &priv->op_eviction_cb);
+
+out_free:
+	if (rc && conn->cc_private)
+		kfree(conn->cc_private);
+
+out:
+	return rc;
+}
+
+static int o2cb_cluster_disconnect(struct ocfs2_cluster_connection *conn)
+{
+	struct dlm_ctxt *dlm = conn->cc_lockspace;
+	struct o2dlm_private *priv = conn->cc_private;
+
+	dlm_unregister_eviction_cb(&priv->op_eviction_cb);
+	conn->cc_private = NULL;
+	kfree(priv);
+
+	dlm_unregister_domain(dlm);
+	conn->cc_lockspace = NULL;
+
+	return 0;
+}
+
+static void o2hb_stop(const char *group)
+{
+	int ret;
+	char *argv[5], *envp[3];
+
+	argv[0] = (char *)o2nm_get_hb_ctl_path();
+	argv[1] = "-K";
+	argv[2] = "-u";
+	argv[3] = (char *)group;
+	argv[4] = NULL;
+
+	mlog(0, "Run: %s %s %s %s\n", argv[0], argv[1], argv[2], argv[3]);
+
+	/* minimal command environment taken from cpu_run_sbin_hotplug */
+	envp[0] = "HOME=/";
+	envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
+	envp[2] = NULL;
+
+	ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
+	if (ret < 0)
+		mlog_errno(ret);
+}
+
+/*
+ * Hangup is a hack for tools compatibility.  Older ocfs2-tools software
+ * expects the filesystem to call "ocfs2_hb_ctl" during unmount.  This
+ * happens regardless of whether the DLM got started, so we can't do it
+ * in ocfs2_cluster_disconnect().  We bring the o2hb_stop() function into
+ * the glue and provide a "hangup" API for super.c to call.
+ *
+ * Other stacks will eventually provide a NULL ->hangup() pointer.
+ */
+static void o2cb_cluster_hangup(const char *group, int grouplen)
+{
+	o2hb_stop(group);
+}
+
+static int o2cb_cluster_this_node(unsigned int *node)
+{
+	int node_num;
+
+	node_num = o2nm_this_node();
+	if (node_num == O2NM_INVALID_NODE_NUM)
+		return -ENOENT;
+
+	if (node_num >= O2NM_MAX_NODES)
+		return -EOVERFLOW;
+
+	*node = node_num;
+	return 0;
+}
+
+struct ocfs2_stack_operations o2cb_stack_ops = {
+	.connect	= o2cb_cluster_connect,
+	.disconnect	= o2cb_cluster_disconnect,
+	.hangup		= o2cb_cluster_hangup,
+	.this_node	= o2cb_cluster_this_node,
+	.dlm_lock	= o2cb_dlm_lock,
+	.dlm_unlock	= o2cb_dlm_unlock,
+	.lock_status	= o2cb_dlm_lock_status,
+	.lock_lvb	= o2cb_dlm_lvb,
+	.dump_lksb	= o2cb_dump_lksb,
+};
+
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index e35dde6..e197367 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -19,204 +19,17 @@
  */
 
 #include <linux/slab.h>
-#include <linux/crc32.h>
 #include <linux/kmod.h>
 
 /* Needed for AOP_TRUNCATED_PAGE in mlog_errno() */
 #include <linux/fs.h>
 
 #include "cluster/masklog.h"
-#include "cluster/nodemanager.h"
-#include "cluster/heartbeat.h"
 
 #include "stackglue.h"
 
-static struct ocfs2_locking_protocol *lproto;
-
-struct o2dlm_private {
-	struct dlm_eviction_cb op_eviction_cb;
-};
-
-/* These should be identical */
-#if (DLM_LOCK_IV != LKM_IVMODE)
-# error Lock modes do not match
-#endif
-#if (DLM_LOCK_NL != LKM_NLMODE)
-# error Lock modes do not match
-#endif
-#if (DLM_LOCK_CR != LKM_CRMODE)
-# error Lock modes do not match
-#endif
-#if (DLM_LOCK_CW != LKM_CWMODE)
-# error Lock modes do not match
-#endif
-#if (DLM_LOCK_PR != LKM_PRMODE)
-# error Lock modes do not match
-#endif
-#if (DLM_LOCK_PW != LKM_PWMODE)
-# error Lock modes do not match
-#endif
-#if (DLM_LOCK_EX != LKM_EXMODE)
-# error Lock modes do not match
-#endif
-static inline int mode_to_o2dlm(int mode)
-{
-	BUG_ON(mode > LKM_MAXMODE);
-
-	return mode;
-}
-
-#define map_flag(_generic, _o2dlm)		\
-	if (flags & (_generic)) {		\
-		flags &= ~(_generic);		\
-		o2dlm_flags |= (_o2dlm);	\
-	}
-static int flags_to_o2dlm(u32 flags)
-{
-	int o2dlm_flags = 0;
-
-	map_flag(DLM_LKF_NOQUEUE, LKM_NOQUEUE);
-	map_flag(DLM_LKF_CANCEL, LKM_CANCEL);
-	map_flag(DLM_LKF_CONVERT, LKM_CONVERT);
-	map_flag(DLM_LKF_VALBLK, LKM_VALBLK);
-	map_flag(DLM_LKF_IVVALBLK, LKM_INVVALBLK);
-	map_flag(DLM_LKF_ORPHAN, LKM_ORPHAN);
-	map_flag(DLM_LKF_FORCEUNLOCK, LKM_FORCE);
-	map_flag(DLM_LKF_TIMEOUT, LKM_TIMEOUT);
-	map_flag(DLM_LKF_LOCAL, LKM_LOCAL);
-
-	/* map_flag() should have cleared every flag passed in */
-	BUG_ON(flags != 0);
-
-	return o2dlm_flags;
-}
-#undef map_flag
-
-/*
- * Map an o2dlm status to standard errno values.
- *
- * o2dlm only uses a handful of these, and returns even fewer to the
- * caller. Still, we try to assign sane values to each error.
- *
- * The following value pairs have special meanings to dlmglue, thus
- * the right hand side needs to stay unique - never duplicate the
- * mapping elsewhere in the table!
- *
- * DLM_NORMAL:		0
- * DLM_NOTQUEUED:	-EAGAIN
- * DLM_CANCELGRANT:	-EBUSY
- * DLM_CANCEL:		-DLM_ECANCEL
- */
-/* Keep in sync with dlmapi.h */
-static int status_map[] = {
-	[DLM_NORMAL]			= 0,		/* Success */
-	[DLM_GRANTED]			= -EINVAL,
-	[DLM_DENIED]			= -EACCES,
-	[DLM_DENIED_NOLOCKS]		= -EACCES,
-	[DLM_WORKING]			= -EACCES,
-	[DLM_BLOCKED]			= -EINVAL,
-	[DLM_BLOCKED_ORPHAN]		= -EINVAL,
-	[DLM_DENIED_GRACE_PERIOD]	= -EACCES,
-	[DLM_SYSERR]			= -ENOMEM,	/* It is what it is */
-	[DLM_NOSUPPORT]			= -EPROTO,
-	[DLM_CANCELGRANT]		= -EBUSY,	/* Cancel after grant */
-	[DLM_IVLOCKID]			= -EINVAL,
-	[DLM_SYNC]			= -EINVAL,
-	[DLM_BADTYPE]			= -EINVAL,
-	[DLM_BADRESOURCE]		= -EINVAL,
-	[DLM_MAXHANDLES]		= -ENOMEM,
-	[DLM_NOCLINFO]			= -EINVAL,
-	[DLM_NOLOCKMGR]			= -EINVAL,
-	[DLM_NOPURGED]			= -EINVAL,
-	[DLM_BADARGS]			= -EINVAL,
-	[DLM_VOID]			= -EINVAL,
-	[DLM_NOTQUEUED]			= -EAGAIN,	/* Trylock failed */
-	[DLM_IVBUFLEN]			= -EINVAL,
-	[DLM_CVTUNGRANT]		= -EPERM,
-	[DLM_BADPARAM]			= -EINVAL,
-	[DLM_VALNOTVALID]		= -EINVAL,
-	[DLM_REJECTED]			= -EPERM,
-	[DLM_ABORT]			= -EINVAL,
-	[DLM_CANCEL]			= -DLM_ECANCEL,	/* Successful cancel */
-	[DLM_IVRESHANDLE]		= -EINVAL,
-	[DLM_DEADLOCK]			= -EDEADLK,
-	[DLM_DENIED_NOASTS]		= -EINVAL,
-	[DLM_FORWARD]			= -EINVAL,
-	[DLM_TIMEOUT]			= -ETIMEDOUT,
-	[DLM_IVGROUPID]			= -EINVAL,
-	[DLM_VERS_CONFLICT]		= -EOPNOTSUPP,
-	[DLM_BAD_DEVICE_PATH]		= -ENOENT,
-	[DLM_NO_DEVICE_PERMISSION]	= -EPERM,
-	[DLM_NO_CONTROL_DEVICE]		= -ENOENT,
-	[DLM_RECOVERING]		= -ENOTCONN,
-	[DLM_MIGRATING]			= -ERESTART,
-	[DLM_MAXSTATS]			= -EINVAL,
-};
-
-static int dlm_status_to_errno(enum dlm_status status)
-{
-	BUG_ON(status > (sizeof(status_map) / sizeof(status_map[0])));
+struct ocfs2_locking_protocol *stack_glue_lproto;
 
-	return status_map[status];
-}
-
-static void o2dlm_lock_ast_wrapper(void *astarg)
-{
-	BUG_ON(lproto == NULL);
-
-	lproto->lp_lock_ast(astarg);
-}
-
-static void o2dlm_blocking_ast_wrapper(void *astarg, int level)
-{
-	BUG_ON(lproto == NULL);
-
-	lproto->lp_blocking_ast(astarg, level);
-}
-
-static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
-{
-	int error = dlm_status_to_errno(status);
-
-	BUG_ON(lproto == NULL);
-
-	/*
-	 * In o2dlm, you can get both the lock_ast() for the lock being
-	 * granted and the unlock_ast() for the CANCEL failing.  A
-	 * successful cancel sends DLM_NORMAL here.  If the
-	 * lock grant happened before the cancel arrived, you get
-	 * DLM_CANCELGRANT.
-	 *
-	 * There's no need for the double-ast.  If we see DLM_CANCELGRANT,
-	 * we just ignore it.  We expect the lock_ast() to handle the
-	 * granted lock.
-	 */
-	if (status == DLM_CANCELGRANT)
-		return;
-
-	lproto->lp_unlock_ast(astarg, error);
-}
-
-static int o2cb_dlm_lock(struct ocfs2_cluster_connection *conn,
-			 int mode,
-			 union ocfs2_dlm_lksb *lksb,
-			 u32 flags,
-			 void *name,
-			 unsigned int namelen,
-			 void *astarg)
-{
-	enum dlm_status status;
-	int o2dlm_mode = mode_to_o2dlm(mode);
-	int o2dlm_flags = flags_to_o2dlm(flags);
-	int ret;
-
-	status = dlmlock(conn->cc_lockspace, o2dlm_mode, &lksb->lksb_o2dlm,
-			 o2dlm_flags, name, namelen,
-			 o2dlm_lock_ast_wrapper, astarg,
-			 o2dlm_blocking_ast_wrapper);
-	ret = dlm_status_to_errno(status);
-	return ret;
-}
 
 int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   int mode,
@@ -226,25 +39,10 @@ int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   unsigned int namelen,
 		   void *astarg)
 {
-	BUG_ON(lproto == NULL);
-
-	return o2cb_dlm_lock(conn, mode, lksb, flags,
-			     name, namelen, astarg);
-}
-
-static int o2cb_dlm_unlock(struct ocfs2_cluster_connection *conn,
-			   union ocfs2_dlm_lksb *lksb,
-			   u32 flags,
-			   void *astarg)
-{
-	enum dlm_status status;
-	int o2dlm_flags = flags_to_o2dlm(flags);
-	int ret;
+	BUG_ON(stack_glue_lproto == NULL);
 
-	status = dlmunlock(conn->cc_lockspace, &lksb->lksb_o2dlm,
-			   o2dlm_flags, o2dlm_unlock_ast_wrapper, astarg);
-	ret = dlm_status_to_errno(status);
-	return ret;
+	return o2cb_stack_ops.dlm_lock(conn, mode, lksb, flags,
+				       name, namelen, astarg);
 }
 
 int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
@@ -252,19 +50,14 @@ int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
 		     u32 flags,
 		     void *astarg)
 {
-	BUG_ON(lproto == NULL);
+	BUG_ON(stack_glue_lproto == NULL);
 
-	return o2cb_dlm_unlock(conn, lksb, flags, astarg);
-}
-
-static int o2cb_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
-{
-	return dlm_status_to_errno(lksb->lksb_o2dlm.status);
+	return o2cb_stack_ops.dlm_unlock(conn, lksb, flags, astarg);
 }
 
 int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
 {
-	return o2cb_dlm_lock_status(lksb);
+	return o2cb_stack_ops.lock_status(lksb);
 }
 
 /*
@@ -272,94 +65,14 @@ int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
  * don't cast at the glue level.  The real answer is that the header
  * ordering is nigh impossible.
  */
-static void *o2cb_dlm_lvb(union ocfs2_dlm_lksb *lksb)
-{
-	return (void *)(lksb->lksb_o2dlm.lvb);
-}
-
 void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb)
 {
-	return o2cb_dlm_lvb(lksb);
-}
-
-static void o2cb_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
-{
-	dlm_print_one_lock(lksb->lksb_o2dlm.lockid);
+	return o2cb_stack_ops.lock_lvb(lksb);
 }
 
 void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
 {
-	o2cb_dlm_dump_lksb(lksb);
-}
-
-/*
- * Called from the dlm when it's about to evict a node. This is how the
- * classic stack signals node death.
- */
-static void o2dlm_eviction_cb(int node_num, void *data)
-{
-	struct ocfs2_cluster_connection *conn = data;
-
-	mlog(ML_NOTICE, "o2dlm has evicted node %d from group %.*s\n",
-	     node_num, conn->cc_namelen, conn->cc_name);
-
-	conn->cc_recovery_handler(node_num, conn->cc_recovery_data);
-}
-
-static int o2cb_cluster_connect(struct ocfs2_cluster_connection *conn)
-{
-	int rc = 0;
-	u32 dlm_key;
-	struct dlm_ctxt *dlm;
-	struct o2dlm_private *priv;
-	struct dlm_protocol_version dlm_version;
-
-	BUG_ON(conn == NULL);
-
-	/* for now we only have one cluster/node, make sure we see it
-	 * in the heartbeat universe */
-	if (!o2hb_check_local_node_heartbeating()) {
-		rc = -EINVAL;
-		goto out;
-	}
-
-	priv = kzalloc(sizeof(struct o2dlm_private), GFP_KERNEL);
-	if (!priv) {
-		rc = -ENOMEM;
-		goto out_free;
-	}
-
-	/* This just fills the structure in.  It is safe to pass conn. */
-	dlm_setup_eviction_cb(&priv->op_eviction_cb, o2dlm_eviction_cb,
-			      conn);
-
-	conn->cc_private = priv;
-
-	/* used by the dlm code to make message headers unique, each
-	 * node in this domain must agree on this. */
-	dlm_key = crc32_le(0, conn->cc_name, conn->cc_namelen);
-	dlm_version.pv_major = conn->cc_version.pv_major;
-	dlm_version.pv_minor = conn->cc_version.pv_minor;
-
-	dlm = dlm_register_domain(conn->cc_name, dlm_key, &dlm_version);
-	if (IS_ERR(dlm)) {
-		rc = PTR_ERR(dlm);
-		mlog_errno(rc);
-		goto out_free;
-	}
-
-	conn->cc_version.pv_major = dlm_version.pv_major;
-	conn->cc_version.pv_minor = dlm_version.pv_minor;
-	conn->cc_lockspace = dlm;
-
-	dlm_register_eviction_cb(dlm, &priv->op_eviction_cb);
-
-out_free:
-	if (rc && conn->cc_private)
-		kfree(conn->cc_private);
-
-out:
-	return rc;
+	o2cb_stack_ops.dump_lksb(lksb);
 }
 
 int ocfs2_cluster_connect(const char *group,
@@ -394,9 +107,9 @@ int ocfs2_cluster_connect(const char *group,
 	new_conn->cc_recovery_data = recovery_data;
 
 	/* Start the new connection at our maximum compatibility level */
-	new_conn->cc_version = lproto->lp_max_version;
+	new_conn->cc_version = stack_glue_lproto->lp_max_version;
 
-	rc = o2cb_cluster_connect(new_conn);
+	rc = o2cb_stack_ops.connect(new_conn);
 	if (rc) {
 		mlog_errno(rc);
 		goto out_free;
@@ -412,29 +125,13 @@ out:
 	return rc;
 }
 
-
-static int o2cb_cluster_disconnect(struct ocfs2_cluster_connection *conn)
-{
-	struct dlm_ctxt *dlm = conn->cc_lockspace;
-	struct o2dlm_private *priv = conn->cc_private;
-
-	dlm_unregister_eviction_cb(&priv->op_eviction_cb);
-	conn->cc_private = NULL;
-	kfree(priv);
-
-	dlm_unregister_domain(dlm);
-	conn->cc_lockspace = NULL;
-
-	return 0;
-}
-
 int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
 {
 	int ret;
 
 	BUG_ON(conn == NULL);
 
-	ret = o2cb_cluster_disconnect(conn);
+	ret = o2cb_stack_ops.disconnect(conn);
 
 	/* XXX Should we free it anyway? */
 	if (!ret)
@@ -443,75 +140,23 @@ int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
 	return ret;
 }
 
-static void o2hb_stop(const char *group)
-{
-	int ret;
-	char *argv[5], *envp[3];
-
-	argv[0] = (char *)o2nm_get_hb_ctl_path();
-	argv[1] = "-K";
-	argv[2] = "-u";
-	argv[3] = (char *)group;
-	argv[4] = NULL;
-
-	mlog(0, "Run: %s %s %s %s\n", argv[0], argv[1], argv[2], argv[3]);
-
-	/* minimal command environment taken from cpu_run_sbin_hotplug */
-	envp[0] = "HOME=/";
-	envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
-	envp[2] = NULL;
-
-	ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
-	if (ret < 0)
-		mlog_errno(ret);
-}
-
-/*
- * Hangup is a hack for tools compatibility.  Older ocfs2-tools software
- * expects the filesystem to call "ocfs2_hb_ctl" during unmount.  This
- * happens regardless of whether the DLM got started, so we can't do it
- * in ocfs2_cluster_disconnect().  We bring the o2hb_stop() function into
- * the glue and provide a "hangup" API for super.c to call.
- *
- * Other stacks will eventually provide a NULL ->hangup() pointer.
- */
-static void o2cb_cluster_hangup(const char *group, int grouplen)
-{
-	o2hb_stop(group);
-}
-
 void ocfs2_cluster_hangup(const char *group, int grouplen)
 {
 	BUG_ON(group == NULL);
 	BUG_ON(group[grouplen] != '\0');
 
-	o2cb_cluster_hangup(group, grouplen);
-}
-
-static int o2cb_cluster_this_node(unsigned int *node)
-{
-	int node_num;
-
-	node_num = o2nm_this_node();
-	if (node_num == O2NM_INVALID_NODE_NUM)
-		return -ENOENT;
-
-	if (node_num >= O2NM_MAX_NODES)
-		return -EOVERFLOW;
-
-	*node = node_num;
-	return 0;
+	o2cb_stack_ops.hangup(group, grouplen);
 }
 
 int ocfs2_cluster_this_node(unsigned int *node)
 {
-	return o2cb_cluster_this_node(node);
+	return o2cb_stack_ops.this_node(node);
 }
 
 void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto)
 {
 	BUG_ON(proto != NULL);
 
-	lproto = proto;
+	stack_glue_lproto = proto;
 }
 
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index decb147..0836322 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -25,6 +25,8 @@
 #include <linux/list.h>
 #include <linux/dlmconstants.h>
 
+#include "dlm/dlmapi.h"
+
 /*
  * dlmconstants.h does not have a LOCAL flag.  We hope to remove it
  * some day, but right now we need it.  Let's fake it.  This value is larger
@@ -39,13 +41,18 @@
 #define GROUP_NAME_MAX		64
 
 
-#include "dlm/dlmapi.h"
-
+/*
+ * ocfs2_protocol_version changes when ocfs2 does something different in
+ * its inter-node behavior.  See dlmglue.c for more information.
+ */
 struct ocfs2_protocol_version {
 	u8 pv_major;
 	u8 pv_minor;
 };
 
+/*
+ * The ocfs2_locking_protocol defines the handlers called on ocfs2's behalf.
+ */
 struct ocfs2_locking_protocol {
 	struct ocfs2_protocol_version lp_max_version;
 	void (*lp_lock_ast)(void *astarg);
@@ -53,10 +60,20 @@ struct ocfs2_locking_protocol {
 	void (*lp_unlock_ast)(void *astarg, int error);
 };
 
+/*
+ * A union of all lock status structures.  We define it here so that the
+ * size of the union is known.  Lock status structures are embedded in
+ * ocfs2 inodes.
+ */
 union ocfs2_dlm_lksb {
 	struct dlm_lockstatus lksb_o2dlm;
 };
 
+/*
+ * A cluster connection.  Mostly opaque to ocfs2, the connection holds
+ * state for the underlying stack.  ocfs2 does use cc_version to determine
+ * locking compatibility.
+ */
 struct ocfs2_cluster_connection {
 	char cc_name[GROUP_NAME_MAX];
 	int cc_namelen;
@@ -67,6 +84,106 @@ struct ocfs2_cluster_connection {
 	void *cc_private;
 };
 
+/*
+ * Each cluster stack implements the stack operations structure.  Not used
+ * in the ocfs2 code, the stackglue code translates generic cluster calls
+ * into stack operations.
+ */
+struct ocfs2_stack_operations {
+	/*
+	 * The fs code calls ocfs2_cluster_connect() to attach a new
+	 * filesystem to the cluster stack.  The ->connect() op is passed
+	 * an ocfs2_cluster_connection with the name and recovery field
+	 * filled in.
+	 *
+	 * The stack must set up any notification mechanisms and create
+	 * the filesystem lockspace in the DLM.  The lockspace should be
+	 * stored on cc_lockspace.  Any other information can be stored on
+	 * cc_private.
+	 *
+	 * ->connect() must not return until it is guaranteed that
+	 *
+	 *  - Node down notifications for the filesystem will be recieved
+	 *    and passed to conn->cc_recovery_handler().
+	 *  - Locking requests for the filesystem will be processed.
+	 */
+	int (*connect)(struct ocfs2_cluster_connection *conn);
+
+	/*
+	 * The fs code calls ocfs2_cluster_disconnect() when a filesystem
+	 * no longer needs cluster services.  All DLM locks have been
+	 * dropped, and recovery notification is being ignored by the
+	 * fs code.  The stack must disengage from the DLM and discontinue
+	 * recovery notification.
+	 *
+	 * Once ->disconnect() has returned, the connection structure will
+	 * be freed.  Thus, a stack must not return from ->disconnect()
+	 * until it will no longer reference the conn pointer.
+	 */
+	int (*disconnect)(struct ocfs2_cluster_connection *conn);
+
+	/*
+	 * ocfs2_cluster_hangup() exists for compatibility with older
+	 * ocfs2 tools.  Only the classic stack really needs it.  As such
+	 * ->hangup() is not required of all stacks.  See the comment by
+	 * ocfs2_cluster_hangup() for more details.
+	 */
+	void (*hangup)(const char *group, int grouplen);
+
+	/*
+	 * ->this_node() returns the cluster's unique identifier for the
+	 * local node.
+	 */
+	int (*this_node)(unsigned int *node);
+
+	/*
+	 * Call the underlying dlm lock function.  The ->dlm_lock()
+	 * callback should convert the flags and mode as appropriate.
+	 *
+	 * ast and bast functions are not part of the call because the
+	 * stack will likely want to wrap ast and bast calls before passing
+	 * them to stack->sp_proto.
+	 */
+	int (*dlm_lock)(struct ocfs2_cluster_connection *conn,
+			int mode,
+			union ocfs2_dlm_lksb *lksb,
+			u32 flags,
+			void *name,
+			unsigned int namelen,
+			void *astarg);
+
+	/*
+	 * Call the underlying dlm unlock function.  The ->dlm_unlock()
+	 * function should convert the flags as appropriate.
+	 *
+	 * The unlock ast is not passed, as the stack will want to wrap
+	 * it before calling stack->sp_proto->lp_unlock_ast().
+	 */
+	int (*dlm_unlock)(struct ocfs2_cluster_connection *conn,
+			  union ocfs2_dlm_lksb *lksb,
+			  u32 flags,
+			  void *astarg);
+
+	/*
+	 * Return the status of the current lock status block.  The fs
+	 * code should never dereference the union.  The ->lock_status()
+	 * callback pulls out the stack-specific lksb, converts the status
+	 * to a proper errno, and returns it.
+	 */
+	int (*lock_status)(union ocfs2_dlm_lksb *lksb);
+
+	/*
+	 * Pull the lvb pointer off of the stack-specific lksb.
+	 */
+	void *(*lock_lvb)(union ocfs2_dlm_lksb *lksb);
+
+	/*
+	 * This is an optoinal debugging hook.  If provided, the
+	 * stack can dump debugging information about this lock.
+	 */
+	void (*dump_lksb)(union ocfs2_dlm_lksb *lksb);
+};
+
 int ocfs2_cluster_connect(const char *group,
 			  int grouplen,
 			  void (*recovery_handler)(int node_num,
@@ -95,4 +212,6 @@ void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb);
 
 void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto);
 
+extern struct ocfs2_locking_protocol *stack_glue_lproto;
+extern struct ocfs2_stack_operations o2cb_stack_ops;
 #endif  /* STACKGLUE_H */
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 22/62] ocfs2: Break out stackglue into modules.
  2008-04-02 20:14                                         ` [Ocfs2-devel] [PATCH 21/62] ocfs2: Create ocfs2_stack_operations and split out the o2cb stack Mark Fasheh
@ 2008-04-02 20:14                                           ` Mark Fasheh
  2008-04-02 20:14                                             ` [Ocfs2-devel] [PATCH 23/62] ocfs2: Create stack glue sysfs files Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

We define the ocfs2_stack_plugin structure to represent a stack driver.
The o2cb stack code is split into stack_o2cb.c.  This becomes the
ocfs2_stack_o2cb.ko module.

The stackglue generic functions are similarly split into the
ocfs2_stackglue.ko module.  This module now provides an interface to
register drivers.  The ocfs2_stack_o2cb driver registers itself.  As
part of this interface, ocfs2_stackglue can load drivers on demand.
This is accomplished in ocfs2_cluster_connect().

ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
If a hangup is pending, it will not release the driver module and will
let _hangup() do that.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
---
 fs/ocfs2/Makefile     |    7 +-
 fs/ocfs2/dlmglue.c    |    7 +-
 fs/ocfs2/dlmglue.h    |    2 +-
 fs/ocfs2/stack_o2cb.c |   41 +++++++--
 fs/ocfs2/stackglue.c  |  238 ++++++++++++++++++++++++++++++++++++++++++++-----
 fs/ocfs2/stackglue.h  |   36 +++++++-
 fs/ocfs2/super.c      |   16 ++--
 7 files changed, 297 insertions(+), 50 deletions(-)

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index 8e86195..b734254 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -2,7 +2,7 @@ EXTRA_CFLAGS += -Ifs/ocfs2
 
 EXTRA_CFLAGS += -DCATCH_BH_JBD_RACES
 
-obj-$(CONFIG_OCFS2_FS) += ocfs2.o
+obj-$(CONFIG_OCFS2_FS) += ocfs2.o ocfs2_stackglue.o ocfs2_stack_o2cb.o
 
 ocfs2-objs := \
 	alloc.o 		\
@@ -24,8 +24,6 @@ ocfs2-objs := \
 	namei.o 		\
 	resize.o		\
 	slot_map.o 		\
-	stackglue.o		\
-	stack_o2cb.o		\
 	suballoc.o 		\
 	super.o 		\
 	symlink.o 		\
@@ -33,5 +31,8 @@ ocfs2-objs := \
 	uptodate.o		\
 	ver.o
 
+ocfs2_stackglue-objs := stackglue.o
+ocfs2_stack_o2cb-objs := stack_o2cb.o
+
 obj-$(CONFIG_OCFS2_FS) += cluster/
 obj-$(CONFIG_OCFS2_FS) += dlm/
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 8a9c849..f62a9e4 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2641,7 +2641,7 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
 		mlog_errno(status);
 		mlog(ML_ERROR,
 		     "could not find this host's node number\n");
-		ocfs2_cluster_disconnect(conn);
+		ocfs2_cluster_disconnect(conn, 0);
 		goto bail;
 	}
 
@@ -2663,7 +2663,8 @@ bail:
 	return status;
 }
 
-void ocfs2_dlm_shutdown(struct ocfs2_super *osb)
+void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
+			int hangup_pending)
 {
 	mlog_entry_void();
 
@@ -2683,7 +2684,7 @@ void ocfs2_dlm_shutdown(struct ocfs2_super *osb)
 	ocfs2_lock_res_free(&osb->osb_super_lockres);
 	ocfs2_lock_res_free(&osb->osb_rename_lockres);
 
-	ocfs2_cluster_disconnect(osb->cconn);
+	ocfs2_cluster_disconnect(osb->cconn, hangup_pending);
 	osb->cconn = NULL;
 
 	ocfs2_dlm_shutdown_debug(osb);
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 34b7598..2bb01f0 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -58,7 +58,7 @@ struct ocfs2_meta_lvb {
 #define OCFS2_LOCK_NONBLOCK		(0x04)
 
 int ocfs2_dlm_init(struct ocfs2_super *osb);
-void ocfs2_dlm_shutdown(struct ocfs2_super *osb);
+void ocfs2_dlm_shutdown(struct ocfs2_super *osb, int hangup_pending);
 void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res);
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
 			       enum ocfs2_lock_type type,
diff --git a/fs/ocfs2/stack_o2cb.c b/fs/ocfs2/stack_o2cb.c
index c9bc354..ac1d74c 100644
--- a/fs/ocfs2/stack_o2cb.c
+++ b/fs/ocfs2/stack_o2cb.c
@@ -18,7 +18,7 @@
  */
 
 #include <linux/crc32.h>
-#include <linux/kmod.h>
+#include <linux/module.h>
 
 /* Needed for AOP_TRUNCATED_PAGE in mlog_errno() */
 #include <linux/fs.h>
@@ -33,6 +33,8 @@ struct o2dlm_private {
 	struct dlm_eviction_cb op_eviction_cb;
 };
 
+static struct ocfs2_stack_plugin o2cb_stack;
+
 /* These should be identical */
 #if (DLM_LOCK_IV != LKM_IVMODE)
 # error Lock modes do not match
@@ -158,23 +160,23 @@ static int dlm_status_to_errno(enum dlm_status status)
 
 static void o2dlm_lock_ast_wrapper(void *astarg)
 {
-	BUG_ON(stack_glue_lproto == NULL);
+	BUG_ON(o2cb_stack.sp_proto == NULL);
 
-	stack_glue_lproto->lp_lock_ast(astarg);
+	o2cb_stack.sp_proto->lp_lock_ast(astarg);
 }
 
 static void o2dlm_blocking_ast_wrapper(void *astarg, int level)
 {
-	BUG_ON(stack_glue_lproto == NULL);
+	BUG_ON(o2cb_stack.sp_proto == NULL);
 
-	stack_glue_lproto->lp_blocking_ast(astarg, level);
+	o2cb_stack.sp_proto->lp_blocking_ast(astarg, level);
 }
 
 static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
 {
 	int error = dlm_status_to_errno(status);
 
-	BUG_ON(stack_glue_lproto == NULL);
+	BUG_ON(o2cb_stack.sp_proto == NULL);
 
 	/*
 	 * In o2dlm, you can get both the lock_ast() for the lock being
@@ -190,7 +192,7 @@ static void o2dlm_unlock_ast_wrapper(void *astarg, enum dlm_status status)
 	if (status == DLM_CANCELGRANT)
 		return;
 
-	stack_glue_lproto->lp_unlock_ast(astarg, error);
+	o2cb_stack.sp_proto->lp_unlock_ast(astarg, error);
 }
 
 static int o2cb_dlm_lock(struct ocfs2_cluster_connection *conn,
@@ -267,6 +269,7 @@ static int o2cb_cluster_connect(struct ocfs2_cluster_connection *conn)
 	struct dlm_protocol_version dlm_version;
 
 	BUG_ON(conn == NULL);
+	BUG_ON(o2cb_stack.sp_proto == NULL);
 
 	/* for now we only have one cluster/node, make sure we see it
 	 * in the heartbeat universe */
@@ -314,7 +317,8 @@ out:
 	return rc;
 }
 
-static int o2cb_cluster_disconnect(struct ocfs2_cluster_connection *conn)
+static int o2cb_cluster_disconnect(struct ocfs2_cluster_connection *conn,
+				   int hangup_pending)
 {
 	struct dlm_ctxt *dlm = conn->cc_lockspace;
 	struct o2dlm_private *priv = conn->cc_private;
@@ -393,3 +397,24 @@ struct ocfs2_stack_operations o2cb_stack_ops = {
 	.dump_lksb	= o2cb_dump_lksb,
 };
 
+static struct ocfs2_stack_plugin o2cb_stack = {
+	.sp_name	= "o2cb",
+	.sp_ops		= &o2cb_stack_ops,
+	.sp_owner	= THIS_MODULE,
+};
+
+static int __init o2cb_stack_init(void)
+{
+	return ocfs2_stack_glue_register(&o2cb_stack);
+}
+
+static void __exit o2cb_stack_exit(void)
+{
+	ocfs2_stack_glue_unregister(&o2cb_stack);
+}
+
+MODULE_AUTHOR("Oracle");
+MODULE_DESCRIPTION("ocfs2 driver for the classic o2cb stack");
+MODULE_LICENSE("GPL");
+module_init(o2cb_stack_init);
+module_exit(o2cb_stack_exit);
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index e197367..1978c9c 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -18,17 +18,176 @@
  * General Public License for more details.
  */
 
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/kmod.h>
 
-/* Needed for AOP_TRUNCATED_PAGE in mlog_errno() */
-#include <linux/fs.h>
+#include "stackglue.h"
 
-#include "cluster/masklog.h"
+static struct ocfs2_locking_protocol *lproto;
+static DEFINE_SPINLOCK(ocfs2_stack_lock);
+static LIST_HEAD(ocfs2_stack_list);
 
-#include "stackglue.h"
+/*
+ * The stack currently in use.  If not null, active_stack->sp_count > 0,
+ * the module is pinned, and the locking protocol cannot be changed.
+ */
+static struct ocfs2_stack_plugin *active_stack;
+
+static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
+{
+	struct ocfs2_stack_plugin *p;
+
+	assert_spin_locked(&ocfs2_stack_lock);
+
+	list_for_each_entry(p, &ocfs2_stack_list, sp_list) {
+		if (!strcmp(p->sp_name, name))
+			return p;
+	}
+
+	return NULL;
+}
+
+static int ocfs2_stack_driver_request(const char *name)
+{
+	int rc;
+	struct ocfs2_stack_plugin *p;
+
+	spin_lock(&ocfs2_stack_lock);
+
+	if (active_stack) {
+		/*
+		 * If the active stack isn't the one we want, it cannot
+		 * be selected right now.
+		 */
+		if (!strcmp(active_stack->sp_name, name))
+			rc = 0;
+		else
+			rc = -EBUSY;
+		goto out;
+	}
+
+	p = ocfs2_stack_lookup(name);
+	if (!p || !try_module_get(p->sp_owner)) {
+		rc = -ENOENT;
+		goto out;
+	}
+
+	/* Ok, the stack is pinned */
+	p->sp_count++;
+	active_stack = p;
+
+	rc = 0;
+
+out:
+	spin_unlock(&ocfs2_stack_lock);
+	return rc;
+}
+
+/*
+ * This function looks up the appropriate stack and makes it active.  If
+ * there is no stack, it tries to load it.  It will fail if the stack still
+ * cannot be found.  It will also fail if a different stack is in use.
+ */
+static int ocfs2_stack_driver_get(const char *name)
+{
+	int rc;
+
+	rc = ocfs2_stack_driver_request(name);
+	if (rc == -ENOENT) {
+		request_module("ocfs2_stack_%s", name);
+		rc = ocfs2_stack_driver_request(name);
+	}
+
+	if (rc == -ENOENT) {
+		printk(KERN_ERR
+		       "ocfs2: Cluster stack driver \"%s\" cannot be found\n",
+		       name);
+	} else if (rc == -EBUSY) {
+		printk(KERN_ERR
+		       "ocfs2: A different cluster stack driver is in use\n");
+	}
+
+	return rc;
+}
 
-struct ocfs2_locking_protocol *stack_glue_lproto;
+static void ocfs2_stack_driver_put(void)
+{
+	spin_lock(&ocfs2_stack_lock);
+	BUG_ON(active_stack == NULL);
+	BUG_ON(active_stack->sp_count == 0);
+
+	active_stack->sp_count--;
+	if (!active_stack->sp_count) {
+		module_put(active_stack->sp_owner);
+		active_stack = NULL;
+	}
+	spin_unlock(&ocfs2_stack_lock);
+}
+
+int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin)
+{
+	int rc;
+
+	spin_lock(&ocfs2_stack_lock);
+	if (!ocfs2_stack_lookup(plugin->sp_name)) {
+		plugin->sp_count = 0;
+		plugin->sp_proto = lproto;
+		list_add(&plugin->sp_list, &ocfs2_stack_list);
+		printk(KERN_INFO "ocfs2: Registered cluster interface %s\n",
+		       plugin->sp_name);
+		rc = 0;
+	} else {
+		printk(KERN_ERR "ocfs2: Stack \"%s\" already registered\n",
+		       plugin->sp_name);
+		rc = -EEXIST;
+	}
+	spin_unlock(&ocfs2_stack_lock);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ocfs2_stack_glue_register);
+
+void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin)
+{
+	struct ocfs2_stack_plugin *p;
+
+	spin_lock(&ocfs2_stack_lock);
+	p = ocfs2_stack_lookup(plugin->sp_name);
+	if (p) {
+		BUG_ON(p != plugin);
+		BUG_ON(plugin == active_stack);
+		BUG_ON(plugin->sp_count != 0);
+		list_del_init(&plugin->sp_list);
+		printk(KERN_INFO "ocfs2: Unregistered cluster interface %s\n",
+		       plugin->sp_name);
+	} else {
+		printk(KERN_ERR "Stack \"%s\" is not registered\n",
+		       plugin->sp_name);
+	}
+	spin_unlock(&ocfs2_stack_lock);
+}
+EXPORT_SYMBOL_GPL(ocfs2_stack_glue_unregister);
+
+void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto)
+{
+	struct ocfs2_stack_plugin *p;
+
+	BUG_ON(proto == NULL);
+
+	spin_lock(&ocfs2_stack_lock);
+	BUG_ON(active_stack != NULL);
+
+	lproto = proto;
+	list_for_each_entry(p, &ocfs2_stack_list, sp_list) {
+		p->sp_proto = lproto;
+	}
+
+	spin_unlock(&ocfs2_stack_lock);
+}
+EXPORT_SYMBOL_GPL(ocfs2_stack_glue_set_locking_protocol);
 
 
 int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
@@ -39,26 +198,29 @@ int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   unsigned int namelen,
 		   void *astarg)
 {
-	BUG_ON(stack_glue_lproto == NULL);
+	BUG_ON(lproto == NULL);
 
-	return o2cb_stack_ops.dlm_lock(conn, mode, lksb, flags,
-				       name, namelen, astarg);
+	return active_stack->sp_ops->dlm_lock(conn, mode, lksb, flags,
+					      name, namelen, astarg);
 }
+EXPORT_SYMBOL_GPL(ocfs2_dlm_lock);
 
 int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
 		     union ocfs2_dlm_lksb *lksb,
 		     u32 flags,
 		     void *astarg)
 {
-	BUG_ON(stack_glue_lproto == NULL);
+	BUG_ON(lproto == NULL);
 
-	return o2cb_stack_ops.dlm_unlock(conn, lksb, flags, astarg);
+	return active_stack->sp_ops->dlm_unlock(conn, lksb, flags, astarg);
 }
+EXPORT_SYMBOL_GPL(ocfs2_dlm_unlock);
 
 int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
 {
-	return o2cb_stack_ops.lock_status(lksb);
+	return active_stack->sp_ops->lock_status(lksb);
 }
+EXPORT_SYMBOL_GPL(ocfs2_dlm_lock_status);
 
 /*
  * Why don't we cast to ocfs2_meta_lvb?  The "clean" answer is that we
@@ -67,13 +229,15 @@ int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
  */
 void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb)
 {
-	return o2cb_stack_ops.lock_lvb(lksb);
+	return active_stack->sp_ops->lock_lvb(lksb);
 }
+EXPORT_SYMBOL_GPL(ocfs2_dlm_lvb);
 
 void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
 {
-	o2cb_stack_ops.dump_lksb(lksb);
+	active_stack->sp_ops->dump_lksb(lksb);
 }
+EXPORT_SYMBOL_GPL(ocfs2_dlm_dump_lksb);
 
 int ocfs2_cluster_connect(const char *group,
 			  int grouplen,
@@ -107,11 +271,16 @@ int ocfs2_cluster_connect(const char *group,
 	new_conn->cc_recovery_data = recovery_data;
 
 	/* Start the new connection at our maximum compatibility level */
-	new_conn->cc_version = stack_glue_lproto->lp_max_version;
+	new_conn->cc_version = lproto->lp_max_version;
+
+	/* This will pin the stack driver if successful */
+	rc = ocfs2_stack_driver_get("o2cb");
+	if (rc)
+		goto out_free;
 
-	rc = o2cb_stack_ops.connect(new_conn);
+	rc = active_stack->sp_ops->connect(new_conn);
 	if (rc) {
-		mlog_errno(rc);
+		ocfs2_stack_driver_put();
 		goto out_free;
 	}
 
@@ -124,39 +293,60 @@ out_free:
 out:
 	return rc;
 }
+EXPORT_SYMBOL_GPL(ocfs2_cluster_connect);
 
-int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn)
+/* If hangup_pending is 0, the stack driver will be dropped */
+int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn,
+			     int hangup_pending)
 {
 	int ret;
 
 	BUG_ON(conn == NULL);
 
-	ret = o2cb_stack_ops.disconnect(conn);
+	ret = active_stack->sp_ops->disconnect(conn, hangup_pending);
 
 	/* XXX Should we free it anyway? */
-	if (!ret)
+	if (!ret) {
 		kfree(conn);
+		if (!hangup_pending)
+			ocfs2_stack_driver_put();
+	}
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(ocfs2_cluster_disconnect);
 
 void ocfs2_cluster_hangup(const char *group, int grouplen)
 {
 	BUG_ON(group == NULL);
 	BUG_ON(group[grouplen] != '\0');
 
-	o2cb_stack_ops.hangup(group, grouplen);
+	active_stack->sp_ops->hangup(group, grouplen);
+
+	/* cluster_disconnect() was called with hangup_pending==1 */
+	ocfs2_stack_driver_put();
 }
+EXPORT_SYMBOL_GPL(ocfs2_cluster_hangup);
 
 int ocfs2_cluster_this_node(unsigned int *node)
 {
-	return o2cb_stack_ops.this_node(node);
+	return active_stack->sp_ops->this_node(node);
 }
+EXPORT_SYMBOL_GPL(ocfs2_cluster_this_node);
 
-void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto)
+
+static int __init ocfs2_stack_glue_init(void)
 {
-	BUG_ON(proto != NULL);
+	return 0;
+}
 
-	stack_glue_lproto = proto;
+static void __exit ocfs2_stack_glue_exit(void)
+{
+	lproto = NULL;
 }
 
+MODULE_AUTHOR("Oracle");
+MODULE_DESCRIPTION("ocfs2 cluter stack glue layer");
+MODULE_LICENSE("GPL");
+module_init(ocfs2_stack_glue_init);
+module_exit(ocfs2_stack_glue_exit);
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index 0836322..c96c8bb 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -119,14 +119,21 @@ struct ocfs2_stack_operations {
 	 * Once ->disconnect() has returned, the connection structure will
 	 * be freed.  Thus, a stack must not return from ->disconnect()
 	 * until it will no longer reference the conn pointer.
+	 *
+	 * If hangup_pending is zero, ocfs2_cluster_disconnect() will also
+	 * be dropping the reference on the module.
 	 */
-	int (*disconnect)(struct ocfs2_cluster_connection *conn);
+	int (*disconnect)(struct ocfs2_cluster_connection *conn,
+			  int hangup_pending);
 
 	/*
 	 * ocfs2_cluster_hangup() exists for compatibility with older
 	 * ocfs2 tools.  Only the classic stack really needs it.  As such
 	 * ->hangup() is not required of all stacks.  See the comment by
 	 * ocfs2_cluster_hangup() for more details.
+	 *
+	 * Note that ocfs2_cluster_hangup() can only be called if
+	 * hangup_pending was passed to ocfs2_cluster_disconnect().
 	 */
 	void (*hangup)(const char *group, int grouplen);
 
@@ -184,13 +191,32 @@ struct ocfs2_stack_operations {
 	void (*dump_lksb)(union ocfs2_dlm_lksb *lksb);
 };
 
+/*
+ * Each stack plugin must describe itself by registering a
+ * ocfs2_stack_plugin structure.  This is only seen by stackglue and the
+ * stack driver.
+ */
+struct ocfs2_stack_plugin {
+	char *sp_name;
+	struct ocfs2_stack_operations *sp_ops;
+	struct module *sp_owner;
+
+	/* These are managed by the stackglue code. */
+	struct list_head sp_list;
+	unsigned int sp_count;
+	struct ocfs2_locking_protocol *sp_proto;
+};
+
+
+/* Used by the filesystem */
 int ocfs2_cluster_connect(const char *group,
 			  int grouplen,
 			  void (*recovery_handler)(int node_num,
 						   void *recovery_data),
 			  void *recovery_data,
 			  struct ocfs2_cluster_connection **conn);
-int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn);
+int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn,
+			     int hangup_pending);
 void ocfs2_cluster_hangup(const char *group, int grouplen);
 int ocfs2_cluster_this_node(unsigned int *node);
 
@@ -212,6 +238,8 @@ void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb);
 
 void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto);
 
-extern struct ocfs2_locking_protocol *stack_glue_lproto;
-extern struct ocfs2_stack_operations o2cb_stack_ops;
+
+/* Used by stack plugins */
+int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
+void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
 #endif  /* STACKGLUE_H */
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index b4a02a0..e27a0d4 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1186,7 +1186,7 @@ leave:
 
 static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 {
-	int tmp;
+	int tmp, hangup_needed = 0;
 	struct ocfs2_super *osb = NULL;
 	char nodestr[8];
 
@@ -1225,19 +1225,21 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 
 	ocfs2_release_system_inodes(osb);
 
-	if (osb->cconn)
-		ocfs2_dlm_shutdown(osb);
-
-	debugfs_remove(osb->osb_debug_root);
-
 	/*
-	 * This is a small hack to move ocfs2_hb_ctl into stackglue.
 	 * If we're dismounting due to mount error, mount.ocfs2 will clean
 	 * up heartbeat.  If we're a local mount, there is no heartbeat.
 	 * If we failed before we got a uuid_str yet, we can't stop
 	 * heartbeat.  Otherwise, do it.
 	 */
 	if (!mnt_err && !ocfs2_mount_local(osb) && osb->uuid_str)
+		hangup_needed = 1;
+
+	if (osb->cconn)
+		ocfs2_dlm_shutdown(osb, hangup_needed);
+
+	debugfs_remove(osb->osb_debug_root);
+
+	if (hangup_needed)
 		ocfs2_cluster_hangup(osb->uuid_str, strlen(osb->uuid_str));
 
 	atomic_set(&osb->vol_state, VOLUME_DISMOUNTED);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 23/62] ocfs2: Create stack glue sysfs files.
  2008-04-02 20:14                                           ` [Ocfs2-devel] [PATCH 22/62] ocfs2: Break out stackglue into modules Mark Fasheh
@ 2008-04-02 20:14                                             ` Mark Fasheh
  2008-04-02 20:14                                               ` [Ocfs2-devel] [PATCH 24/62] ocfs2: Add the USERSPACE_STACK incompat bit Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Introduce a set of sysfs files that describe the current stack glue
state.  The files live under /sys/fs/ocfs2.  The locking_protocol file
displays the version of ocfs2's locking code.  The
loaded_cluster_plugins file displays all of the currently loaded stack
plugins.  When filesystems are mounted, the active_cluster_plugin file
will display the plugin in use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stackglue.c |  121 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 120 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 1978c9c..76ae4fc 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -23,6 +23,9 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/kmod.h>
+#include <linux/fs.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 
 #include "stackglue.h"
 
@@ -335,14 +338,130 @@ int ocfs2_cluster_this_node(unsigned int *node)
 EXPORT_SYMBOL_GPL(ocfs2_cluster_this_node);
 
 
-static int __init ocfs2_stack_glue_init(void)
+/*
+ * Sysfs bits
+ */
+
+static ssize_t ocfs2_max_locking_protocol_show(struct kobject *kobj,
+					       struct kobj_attribute *attr,
+					       char *buf)
+{
+	ssize_t ret = 0;
+
+	spin_lock(&ocfs2_stack_lock);
+	if (lproto)
+		ret = snprintf(buf, PAGE_SIZE, "%u.%u\n",
+			       lproto->lp_max_version.pv_major,
+			       lproto->lp_max_version.pv_minor);
+	spin_unlock(&ocfs2_stack_lock);
+
+	return ret;
+}
+
+static struct kobj_attribute ocfs2_attr_max_locking_protocol =
+	__ATTR(max_locking_protocol, S_IFREG | S_IRUGO,
+	       ocfs2_max_locking_protocol_show, NULL);
+
+static ssize_t ocfs2_loaded_cluster_plugins_show(struct kobject *kobj,
+						 struct kobj_attribute *attr,
+						 char *buf)
 {
+	ssize_t ret = 0, total = 0, remain = PAGE_SIZE;
+	struct ocfs2_stack_plugin *p;
+
+	spin_lock(&ocfs2_stack_lock);
+	list_for_each_entry(p, &ocfs2_stack_list, sp_list) {
+		ret = snprintf(buf, remain, "%s\n",
+			       p->sp_name);
+		if (ret < 0) {
+			total = ret;
+			break;
+		}
+		if (ret == remain) {
+			/* snprintf() didn't fit */
+			total = -E2BIG;
+			break;
+		}
+		total += ret;
+		remain -= ret;
+	}
+	spin_unlock(&ocfs2_stack_lock);
+
+	return total;
+}
+
+static struct kobj_attribute ocfs2_attr_loaded_cluster_plugins =
+	__ATTR(loaded_cluster_plugins, S_IFREG | S_IRUGO,
+	       ocfs2_loaded_cluster_plugins_show, NULL);
+
+static ssize_t ocfs2_active_cluster_plugin_show(struct kobject *kobj,
+						struct kobj_attribute *attr,
+						char *buf)
+{
+	ssize_t ret = 0;
+
+	spin_lock(&ocfs2_stack_lock);
+	if (active_stack) {
+		ret = snprintf(buf, PAGE_SIZE, "%s\n",
+			       active_stack->sp_name);
+		if (ret == PAGE_SIZE)
+			ret = -E2BIG;
+	}
+	spin_unlock(&ocfs2_stack_lock);
+
+	return ret;
+}
+
+static struct kobj_attribute ocfs2_attr_active_cluster_plugin =
+	__ATTR(active_cluster_plugin, S_IFREG | S_IRUGO,
+	       ocfs2_active_cluster_plugin_show, NULL);
+
+static struct attribute *ocfs2_attrs[] = {
+	&ocfs2_attr_max_locking_protocol.attr,
+	&ocfs2_attr_loaded_cluster_plugins.attr,
+	&ocfs2_attr_active_cluster_plugin.attr,
+	NULL,
+};
+
+static struct attribute_group ocfs2_attr_group = {
+	.attrs = ocfs2_attrs,
+};
+
+static struct kset *ocfs2_kset;
+
+static void ocfs2_sysfs_exit(void)
+{
+	kset_unregister(ocfs2_kset);
+}
+
+static int ocfs2_sysfs_init(void)
+{
+	int ret;
+
+	ocfs2_kset = kset_create_and_add("ocfs2", NULL, fs_kobj);
+	if (!ocfs2_kset)
+		return -ENOMEM;
+
+	ret = sysfs_create_group(&ocfs2_kset->kobj, &ocfs2_attr_group);
+	if (ret)
+		goto error;
+
 	return 0;
+
+error:
+	kset_unregister(ocfs2_kset);
+	return ret;
+}
+
+static int __init ocfs2_stack_glue_init(void)
+{
+	return ocfs2_sysfs_init();
 }
 
 static void __exit ocfs2_stack_glue_exit(void)
 {
 	lproto = NULL;
+	ocfs2_sysfs_exit();
 }
 
 MODULE_AUTHOR("Oracle");
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 24/62] ocfs2: Add the USERSPACE_STACK incompat bit.
  2008-04-02 20:14                                             ` [Ocfs2-devel] [PATCH 23/62] ocfs2: Create stack glue sysfs files Mark Fasheh
@ 2008-04-02 20:14                                               ` Mark Fasheh
  2008-04-02 20:14                                                 ` [Ocfs2-devel] [PATCH 25/62] ocfs2: Add the 'cluster_stack' sysfs file Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The filesystem gains the USERSPACE_STACK incomat bit and the
s_cluster_info field on the superblock.  When a userspace stack is in
use, the name of the stack is stored on-disk for mount-time
verification.

The "cluster_stack" option is added to mount(2) processing.  The mount
process needs to pass the matching stack name.  If the passed name and
the on-disk name do not match, the mount is failed.

When using the classic o2cb stack, the incompat bit is *not* set and no
mount option is used other than the usual heartbeat=local.  Thus, the
filesystem is compatible with older tools.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/ocfs2.h    |    7 ++++
 fs/ocfs2/ocfs2_fs.h |   40 +++++++++++++++++++++-
 fs/ocfs2/super.c    |   90 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index af929ec..9ff5811 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -248,6 +248,7 @@ struct ocfs2_super
 	struct ocfs2_alloc_stats alloc_stats;
 	char dev_str[20];		/* "major,minor" of the device */
 
+	char osb_cluster_stack[OCFS2_STACK_LABEL_LEN + 1];
 	struct ocfs2_cluster_connection *cconn;
 	struct ocfs2_lock_res osb_super_lockres;
 	struct ocfs2_lock_res osb_rename_lockres;
@@ -368,6 +369,12 @@ static inline int ocfs2_is_soft_readonly(struct ocfs2_super *osb)
 	return ret;
 }
 
+static inline int ocfs2_userspace_stack(struct ocfs2_super *osb)
+{
+	return (osb->s_feature_incompat &
+		OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK);
+}
+
 static inline int ocfs2_mount_local(struct ocfs2_super *osb)
 {
 	return (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT);
diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index c495023..52c4266 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -89,7 +89,8 @@
 #define OCFS2_FEATURE_INCOMPAT_SUPP	(OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT \
 					 | OCFS2_FEATURE_INCOMPAT_SPARSE_ALLOC \
 					 | OCFS2_FEATURE_INCOMPAT_INLINE_DATA \
-					 | OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP)
+					 | OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP \
+					 | OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK)
 #define OCFS2_FEATURE_RO_COMPAT_SUPP	OCFS2_FEATURE_RO_COMPAT_UNWRITTEN
 
 /*
@@ -131,6 +132,17 @@
 
 
 /*
+ * Support for alternate, userspace cluster stacks.  If set, the superblock
+ * field s_cluster_info contains a tag for the alternate stack in use as
+ * well as the name of the cluster being joined.
+ * mount.ocfs2 must pass in a matching stack name.
+ *
+ * If not set, the classic stack will be used.  This is compatbile with
+ * all older versions.
+ */
+#define OCFS2_FEATURE_INCOMPAT_USERSPACE_STACK	0x0080
+
+/*
  * backup superblock flag is used to indicate that this volume
  * has backup superblocks.
  */
@@ -272,6 +284,10 @@ struct ocfs2_new_group_input {
 #define OCFS2_VOL_UUID_LEN		16
 #define OCFS2_MAX_VOL_LABEL_LEN		64
 
+/* The alternate, userspace stack fields */
+#define OCFS2_STACK_LABEL_LEN		4
+#define OCFS2_CLUSTER_NAME_LEN		16
+
 /* Journal limits (in bytes) */
 #define OCFS2_MIN_JOURNAL_SIZE		(4 * 1024 * 1024)
 
@@ -513,6 +529,13 @@ struct ocfs2_slot_map_extended {
  */
 };
 
+struct ocfs2_cluster_info {
+/*00*/	__u8   ci_stack[OCFS2_STACK_LABEL_LEN];
+	__le32 ci_reserved;
+/*08*/	__u8   ci_cluster[OCFS2_CLUSTER_NAME_LEN];
+/*18*/
+};
+
 /*
  * On disk superblock for OCFS2
  * Note that it is contained inside an ocfs2_dinode, so all offsets
@@ -545,7 +568,20 @@ struct ocfs2_super_block {
 					 * group header */
 /*50*/	__u8  s_label[OCFS2_MAX_VOL_LABEL_LEN];	/* Label for mounting, etc. */
 /*90*/	__u8  s_uuid[OCFS2_VOL_UUID_LEN];	/* 128-bit uuid */
-/*A0*/
+/*A0*/  struct ocfs2_cluster_info s_cluster_info; /* Selected userspace
+						     stack.  Only valid
+						     with INCOMPAT flag. */
+/*B8*/  __le64 s_reserved2[17];		/* Fill out superblock */
+/*140*/
+
+	/*
+	 * NOTE: As stated above, all offsets are relative to
+	 * ocfs2_dinode.id2, which is at 0xC0 in the inode.
+	 * 0xC0 + 0x140 = 0x200 or 512 bytes.  A superblock must fit within
+	 * our smallest blocksize, which is 512 bytes.  To ensure this,
+	 * we reserve the space in s_reserved2.  Anything past s_reserved2
+	 * will not be available on the smallest blocksize.
+	 */
 };
 
 /*
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index e27a0d4..96ebe36 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -87,6 +87,7 @@ struct mount_options
 	unsigned int	atime_quantum;
 	signed short	slot;
 	unsigned int	localalloc_opt;
+	char		cluster_stack[OCFS2_STACK_LABEL_LEN + 1];
 };
 
 static int ocfs2_parse_options(struct super_block *sb, char *options,
@@ -152,6 +153,7 @@ enum {
 	Opt_commit,
 	Opt_localalloc,
 	Opt_localflocks,
+	Opt_stack,
 	Opt_err,
 };
 
@@ -170,6 +172,7 @@ static match_table_t tokens = {
 	{Opt_commit, "commit=%u"},
 	{Opt_localalloc, "localalloc=%d"},
 	{Opt_localflocks, "localflocks"},
+	{Opt_stack, "cluster_stack=%s"},
 	{Opt_err, NULL}
 };
 
@@ -549,8 +552,17 @@ static int ocfs2_verify_heartbeat(struct ocfs2_super *osb)
 		}
 	}
 
+	if (ocfs2_userspace_stack(osb)) {
+		if (osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL) {
+			mlog(ML_ERROR, "Userspace stack expected, but "
+			     "o2cb heartbeat arguments passed to mount\n");
+			return -EINVAL;
+		}
+	}
+
 	if (!(osb->s_mount_opt & OCFS2_MOUNT_HB_LOCAL)) {
-		if (!ocfs2_mount_local(osb) && !ocfs2_is_hard_readonly(osb)) {
+		if (!ocfs2_mount_local(osb) && !ocfs2_is_hard_readonly(osb) &&
+		    !ocfs2_userspace_stack(osb)) {
 			mlog(ML_ERROR, "Heartbeat has to be started to mount "
 			     "a read-write clustered device.\n");
 			return -EINVAL;
@@ -560,6 +572,35 @@ static int ocfs2_verify_heartbeat(struct ocfs2_super *osb)
 	return 0;
 }
 
+/*
+ * If we're using a userspace stack, mount should have passed
+ * a name that matches the disk.  If not, mount should not
+ * have passed a stack.
+ */
+static int ocfs2_verify_userspace_stack(struct ocfs2_super *osb,
+					struct mount_options *mopt)
+{
+	if (!ocfs2_userspace_stack(osb) && mopt->cluster_stack[0]) {
+		mlog(ML_ERROR,
+		     "cluster stack passed to mount, but this filesystem "
+		     "does not support it\n");
+		return -EINVAL;
+	}
+
+	if (ocfs2_userspace_stack(osb) &&
+	    strncmp(osb->osb_cluster_stack, mopt->cluster_stack,
+		    OCFS2_STACK_LABEL_LEN)) {
+		mlog(ML_ERROR,
+		     "cluster stack passed to mount (\"%s\") does not "
+		     "match the filesystem (\"%s\")\n",
+		     mopt->cluster_stack,
+		     osb->osb_cluster_stack);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct dentry *root;
@@ -598,6 +639,10 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
 	osb->osb_commit_interval = parsed_options.commit_interval;
 	osb->local_alloc_size = parsed_options.localalloc_opt;
 
+	status = ocfs2_verify_userspace_stack(osb, &parsed_options);
+	if (status)
+		goto read_super_error;
+
 	sb->s_magic = OCFS2_SUPER_MAGIC;
 
 	/* Hard readonly mode only if: bdev_read_only, MS_RDONLY,
@@ -752,6 +797,7 @@ static int ocfs2_parse_options(struct super_block *sb,
 	mopt->atime_quantum = OCFS2_DEFAULT_ATIME_QUANTUM;
 	mopt->slot = OCFS2_INVALID_SLOT;
 	mopt->localalloc_opt = OCFS2_DEFAULT_LOCAL_ALLOC_SIZE;
+	mopt->cluster_stack[0] = '\0';
 
 	if (!options) {
 		status = 1;
@@ -853,6 +899,25 @@ static int ocfs2_parse_options(struct super_block *sb,
 			if (!is_remount)
 				mopt->mount_opt |= OCFS2_MOUNT_LOCALFLOCKS;
 			break;
+		case Opt_stack:
+			/* Check both that the option we were passed
+			 * is of the right length and that it is a proper
+			 * string of the right length.
+			 */
+			if (((args[0].to - args[0].from) !=
+			     OCFS2_STACK_LABEL_LEN) ||
+			    (strnlen(args[0].from,
+				     OCFS2_STACK_LABEL_LEN) !=
+			     OCFS2_STACK_LABEL_LEN)) {
+				mlog(ML_ERROR,
+				     "Invalid cluster_stack option\n");
+				status = 0;
+				goto bail;
+			}
+			memcpy(mopt->cluster_stack, args[0].from,
+			       OCFS2_STACK_LABEL_LEN);
+			mopt->cluster_stack[OCFS2_STACK_LABEL_LEN] = '\0';
+			break;
 		default:
 			mlog(ML_ERROR,
 			     "Unrecognized mount option \"%s\" "
@@ -911,6 +976,10 @@ static int ocfs2_show_options(struct seq_file *s, struct vfsmount *mnt)
 	if (opts & OCFS2_MOUNT_LOCALFLOCKS)
 		seq_printf(s, ",localflocks,");
 
+	if (osb->osb_cluster_stack[0])
+		seq_printf(s, ",cluster_stack=%.*s", OCFS2_STACK_LABEL_LEN,
+			   osb->osb_cluster_stack);
+
 	return 0;
 }
 
@@ -1403,6 +1472,25 @@ static int ocfs2_initialize_super(struct super_block *sb,
 		goto bail;
 	}
 
+	if (ocfs2_userspace_stack(osb)) {
+		memcpy(osb->osb_cluster_stack,
+		       OCFS2_RAW_SB(di)->s_cluster_info.ci_stack,
+		       OCFS2_STACK_LABEL_LEN);
+		osb->osb_cluster_stack[OCFS2_STACK_LABEL_LEN] = '\0';
+		if (strlen(osb->osb_cluster_stack) != OCFS2_STACK_LABEL_LEN) {
+			mlog(ML_ERROR,
+			     "couldn't mount because of an invalid "
+			     "cluster stack label (%s) \n",
+			     osb->osb_cluster_stack);
+			status = -EINVAL;
+			goto bail;
+		}
+	} else {
+		/* The empty string is identical with classic tools that
+		 * don't know about s_cluster_info. */
+		osb->osb_cluster_stack[0] = '\0';
+	}
+
 	get_random_bytes(&osb->s_next_generation, sizeof(u32));
 
 	/* FIXME
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 25/62] ocfs2: Add the 'cluster_stack' sysfs file.
  2008-04-02 20:14                                               ` [Ocfs2-devel] [PATCH 24/62] ocfs2: Add the USERSPACE_STACK incompat bit Mark Fasheh
@ 2008-04-02 20:14                                                 ` Mark Fasheh
  2008-04-02 20:14                                                   ` [Ocfs2-devel] [PATCH 26/62] ocfs2: Add the user stack module Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Userspace can now query and specify the cluster stack in use via the
/sys/fs/ocfs2/cluster_stack file.  By default, it is 'o2cb', which is
the classic stack.  Thus, old tools that do not know how to modify this
file will work just fine.  The stack cannot be modified if there is a
live filesystem.

ocfs2_cluster_connect() now takes the expected cluster stack as an
argument.  This way, the filesystem and the stack glue ensure they are
speaking to the same backend.

If the stack is 'o2cb', the o2cb stack plugin is used.  For any other
value, the fsdlm stack plugin is selected.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlmglue.c   |    3 +-
 fs/ocfs2/stackglue.c |  111 +++++++++++++++++++++++++++++++++++++++++++++-----
 fs/ocfs2/stackglue.h |    3 +-
 3 files changed, 104 insertions(+), 13 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index f62a9e4..394d25a 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2627,7 +2627,8 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
 	}
 
 	/* for now, uuid == domain */
-	status = ocfs2_cluster_connect(osb->uuid_str,
+	status = ocfs2_cluster_connect(osb->osb_cluster_stack,
+				       osb->uuid_str,
 				       strlen(osb->uuid_str),
 				       ocfs2_do_node_down, osb,
 				       &conn);
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 76ae4fc..bf45d9b 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -27,11 +27,17 @@
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
 
+#include "ocfs2_fs.h"
+
 #include "stackglue.h"
 
+#define OCFS2_STACK_PLUGIN_O2CB		"o2cb"
+#define OCFS2_STACK_PLUGIN_USER		"user"
+
 static struct ocfs2_locking_protocol *lproto;
 static DEFINE_SPINLOCK(ocfs2_stack_lock);
 static LIST_HEAD(ocfs2_stack_list);
+static char cluster_stack_name[OCFS2_STACK_LABEL_LEN + 1];
 
 /*
  * The stack currently in use.  If not null, active_stack->sp_count > 0,
@@ -53,26 +59,36 @@ static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
 	return NULL;
 }
 
-static int ocfs2_stack_driver_request(const char *name)
+static int ocfs2_stack_driver_request(const char *stack_name,
+				      const char *plugin_name)
 {
 	int rc;
 	struct ocfs2_stack_plugin *p;
 
 	spin_lock(&ocfs2_stack_lock);
 
+	/*
+	 * If the stack passed by the filesystem isn't the selected one,
+	 * we can't continue.
+	 */
+	if (strcmp(stack_name, cluster_stack_name)) {
+		rc = -EBUSY;
+		goto out;
+	}
+
 	if (active_stack) {
 		/*
 		 * If the active stack isn't the one we want, it cannot
 		 * be selected right now.
 		 */
-		if (!strcmp(active_stack->sp_name, name))
+		if (!strcmp(active_stack->sp_name, plugin_name))
 			rc = 0;
 		else
 			rc = -EBUSY;
 		goto out;
 	}
 
-	p = ocfs2_stack_lookup(name);
+	p = ocfs2_stack_lookup(plugin_name);
 	if (!p || !try_module_get(p->sp_owner)) {
 		rc = -ENOENT;
 		goto out;
@@ -94,23 +110,42 @@ out:
  * there is no stack, it tries to load it.  It will fail if the stack still
  * cannot be found.  It will also fail if a different stack is in use.
  */
-static int ocfs2_stack_driver_get(const char *name)
+static int ocfs2_stack_driver_get(const char *stack_name)
 {
 	int rc;
+	char *plugin_name = OCFS2_STACK_PLUGIN_O2CB;
+
+	/*
+	 * Classic stack does not pass in a stack name.  This is
+	 * compatible with older tools as well.
+	 */
+	if (!stack_name || !*stack_name)
+		stack_name = OCFS2_STACK_PLUGIN_O2CB;
+
+	if (strlen(stack_name) != OCFS2_STACK_LABEL_LEN) {
+		printk(KERN_ERR
+		       "ocfs2 passed an invalid cluster stack label: \"%s\"\n",
+		       stack_name);
+		return -EINVAL;
+	}
 
-	rc = ocfs2_stack_driver_request(name);
+	/* Anything that isn't the classic stack is a user stack */
+	if (strcmp(stack_name, OCFS2_STACK_PLUGIN_O2CB))
+		plugin_name = OCFS2_STACK_PLUGIN_USER;
+
+	rc = ocfs2_stack_driver_request(stack_name, plugin_name);
 	if (rc == -ENOENT) {
-		request_module("ocfs2_stack_%s", name);
-		rc = ocfs2_stack_driver_request(name);
+		request_module("ocfs2_stack_%s", plugin_name);
+		rc = ocfs2_stack_driver_request(stack_name, plugin_name);
 	}
 
 	if (rc == -ENOENT) {
 		printk(KERN_ERR
 		       "ocfs2: Cluster stack driver \"%s\" cannot be found\n",
-		       name);
+		       plugin_name);
 	} else if (rc == -EBUSY) {
 		printk(KERN_ERR
-		       "ocfs2: A different cluster stack driver is in use\n");
+		       "ocfs2: A different cluster stack is in use\n");
 	}
 
 	return rc;
@@ -242,7 +277,8 @@ void ocfs2_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
 }
 EXPORT_SYMBOL_GPL(ocfs2_dlm_dump_lksb);
 
-int ocfs2_cluster_connect(const char *group,
+int ocfs2_cluster_connect(const char *stack_name,
+			  const char *group,
 			  int grouplen,
 			  void (*recovery_handler)(int node_num,
 						   void *recovery_data),
@@ -277,7 +313,7 @@ int ocfs2_cluster_connect(const char *group,
 	new_conn->cc_version = lproto->lp_max_version;
 
 	/* This will pin the stack driver if successful */
-	rc = ocfs2_stack_driver_get("o2cb");
+	rc = ocfs2_stack_driver_get(stack_name);
 	if (rc)
 		goto out_free;
 
@@ -416,10 +452,61 @@ static struct kobj_attribute ocfs2_attr_active_cluster_plugin =
 	__ATTR(active_cluster_plugin, S_IFREG | S_IRUGO,
 	       ocfs2_active_cluster_plugin_show, NULL);
 
+static ssize_t ocfs2_cluster_stack_show(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	ssize_t ret;
+	spin_lock(&ocfs2_stack_lock);
+	ret = snprintf(buf, PAGE_SIZE, "%s\n", cluster_stack_name);
+	spin_unlock(&ocfs2_stack_lock);
+
+	return ret;
+}
+
+static ssize_t ocfs2_cluster_stack_store(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 const char *buf, size_t count)
+{
+	size_t len = count;
+	ssize_t ret;
+
+	if (len == 0)
+		return len;
+
+	if (buf[len - 1] == '\n')
+		len--;
+
+	if ((len != OCFS2_STACK_LABEL_LEN) ||
+	    (strnlen(buf, len) != len))
+		return -EINVAL;
+
+	spin_lock(&ocfs2_stack_lock);
+	if (active_stack) {
+		if (!strncmp(buf, cluster_stack_name, len))
+			ret = count;
+		else
+			ret = -EBUSY;
+	} else {
+		memcpy(cluster_stack_name, buf, len);
+		ret = count;
+	}
+	spin_unlock(&ocfs2_stack_lock);
+
+	return ret;
+}
+
+
+static struct kobj_attribute ocfs2_attr_cluster_stack =
+	__ATTR(cluster_stack, S_IFREG | S_IRUGO | S_IWUSR,
+	       ocfs2_cluster_stack_show,
+	       ocfs2_cluster_stack_store);
+
 static struct attribute *ocfs2_attrs[] = {
 	&ocfs2_attr_max_locking_protocol.attr,
 	&ocfs2_attr_loaded_cluster_plugins.attr,
 	&ocfs2_attr_active_cluster_plugin.attr,
+	&ocfs2_attr_cluster_stack.attr,
 	NULL,
 };
 
@@ -455,6 +542,8 @@ error:
 
 static int __init ocfs2_stack_glue_init(void)
 {
+	strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);
+
 	return ocfs2_sysfs_init();
 }
 
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index c96c8bb..d88bc65 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -209,7 +209,8 @@ struct ocfs2_stack_plugin {
 
 
 /* Used by the filesystem */
-int ocfs2_cluster_connect(const char *group,
+int ocfs2_cluster_connect(const char *stack_name,
+			  const char *group,
 			  int grouplen,
 			  void (*recovery_handler)(int node_num,
 						   void *recovery_data),
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 26/62] ocfs2: Add the user stack module.
  2008-04-02 20:14                                                 ` [Ocfs2-devel] [PATCH 25/62] ocfs2: Add the 'cluster_stack' sysfs file Mark Fasheh
@ 2008-04-02 20:14                                                   ` Mark Fasheh
  2008-04-02 20:14                                                     ` [Ocfs2-devel] [PATCH 27/62] ocfs2: Add the ocfs2_control misc device Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Add a skeleton for the stack_user module.  It's just the barebones module
code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stack_user.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)
 create mode 100644 fs/ocfs2/stack_user.c

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
new file mode 100644
index 0000000..920eb11
--- /dev/null
+++ b/fs/ocfs2/stack_user.c
@@ -0,0 +1,38 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * stack_user.c
+ *
+ * Code which interfaces ocfs2 with fs/dlm and a userspace stack.
+ *
+ * Copyright (C) 2007 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+
+#include "stackglue.h"
+
+
+static int __init user_stack_init(void)
+{
+	return 0;
+}
+
+static void __exit user_stack_exit(void)
+{
+}
+
+MODULE_AUTHOR("Oracle");
+MODULE_DESCRIPTION("ocfs2 driver for userspace cluster stacks");
+MODULE_LICENSE("GPL");
+module_init(user_stack_init);
+module_exit(user_stack_exit);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 27/62] ocfs2: Add the ocfs2_control misc device.
  2008-04-02 20:14                                                   ` [Ocfs2-devel] [PATCH 26/62] ocfs2: Add the user stack module Mark Fasheh
@ 2008-04-02 20:14                                                     ` Mark Fasheh
  2008-04-02 20:14                                                       ` [Ocfs2-devel] [PATCH 28/62] ocfs2: Start the ocfs2_control handshake Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The ocfs2_control misc device is how a userspace control daemon (controld)
talks to the filesystem.  Introduce the bare-bones filesystem ops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stack_user.c |  184 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 183 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index 920eb11..fdca5d3 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -18,17 +18,199 @@
  */
 
 #include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/mutex.h>
+#include <linux/reboot.h>
 
 #include "stackglue.h"
 
 
-static int __init user_stack_init(void)
+/*
+ * The control protocol starts with a handshake.  Until the handshake
+ * is complete, the control device will fail all write(2)s.
+ *
+ * The handshake is simple.  First, the client reads until EOF.  Each line
+ * of output is a supported protocol tag.  All protocol tags are a single
+ * character followed by a two hex digit version number.  Currently the
+ * only things supported is T01, for "Text-base version 0x01".  Next, the
+ * client writes the version they would like to use.  If the version tag
+ * written is unknown, -EINVAL is returned.  Once the negotiation is
+ * complete, the client can start sending messages.
+ */
+
+/*
+ * ocfs2_live_connection is refcounted because the filesystem and
+ * miscdevice sides can detach in different order.  Let's just be safe.
+ */
+struct ocfs2_live_connection {
+	struct list_head		oc_list;
+	struct ocfs2_cluster_connection	*oc_conn;
+};
+
+static atomic_t ocfs2_control_opened;
+
+static LIST_HEAD(ocfs2_live_connection_list);
+static DEFINE_MUTEX(ocfs2_control_lock);
+
+static struct ocfs2_live_connection *ocfs2_connection_find(const char *name)
+{
+	size_t len = strlen(name);
+	struct ocfs2_live_connection *c;
+
+	BUG_ON(!mutex_is_locked(&ocfs2_control_lock));
+
+	list_for_each_entry(c, &ocfs2_live_connection_list, oc_list) {
+		if ((c->oc_conn->cc_namelen == len) &&
+		    !strncmp(c->oc_conn->cc_name, name, len))
+			return c;
+	}
+
+	return c;
+}
+
+/*
+ * ocfs2_live_connection structures are created underneath the ocfs2
+ * mount path.  Since the VFS prevents multiple calls to
+ * fill_super(), we can't get dupes here.
+ */
+static int ocfs2_live_connection_new(struct ocfs2_cluster_connection *conn,
+				     struct ocfs2_live_connection **c_ret)
+{
+	int rc = 0;
+	struct ocfs2_live_connection *c;
+
+	c = kzalloc(sizeof(struct ocfs2_live_connection), GFP_KERNEL);
+	if (!c)
+		return -ENOMEM;
+
+	mutex_lock(&ocfs2_control_lock);
+	c->oc_conn = conn;
+
+	if (atomic_read(&ocfs2_control_opened))
+		list_add(&c->oc_list, &ocfs2_live_connection_list);
+	else {
+		printk(KERN_ERR
+		       "ocfs2: Userspace control daemon is not present\n");
+		rc = -ESRCH;
+	}
+
+	mutex_unlock(&ocfs2_control_lock);
+
+	if (!rc)
+		*c_ret = c;
+	else
+		kfree(c);
+
+	return rc;
+}
+
+/*
+ * This function disconnects the cluster connection from ocfs2_control.
+ * Afterwards, userspace can't affect the cluster connection.
+ */
+static void ocfs2_live_connection_drop(struct ocfs2_live_connection *c)
+{
+	mutex_lock(&ocfs2_control_lock);
+	list_del_init(&c->oc_list);
+	c->oc_conn = NULL;
+	mutex_unlock(&ocfs2_control_lock);
+
+	kfree(c);
+}
+
+
+static ssize_t ocfs2_control_write(struct file *file,
+				   const char __user *buf,
+				   size_t count,
+				   loff_t *ppos)
 {
 	return 0;
 }
 
+static ssize_t ocfs2_control_read(struct file *file,
+				  char __user *buf,
+				  size_t count,
+				  loff_t *ppos)
+{
+	return 0;
+}
+
+static int ocfs2_control_release(struct inode *inode, struct file *file)
+{
+	if (atomic_dec_and_test(&ocfs2_control_opened)) {
+		mutex_lock(&ocfs2_control_lock);
+		if (!list_empty(&ocfs2_live_connection_list)) {
+			/* XXX: Do bad things! */
+			printk(KERN_ERR
+			       "ocfs2: Unexpected release of ocfs2_control!\n"
+			       "       Loss of cluster connection requires "
+			       "an emergency restart!\n");
+			emergency_restart();
+		}
+		mutex_unlock(&ocfs2_control_lock);
+	}
+
+	return 0;
+}
+
+static int ocfs2_control_open(struct inode *inode, struct file *file)
+{
+	atomic_inc(&ocfs2_control_opened);
+
+	return 0;
+}
+
+static const struct file_operations ocfs2_control_fops = {
+	.open    = ocfs2_control_open,
+	.release = ocfs2_control_release,
+	.read    = ocfs2_control_read,
+	.write   = ocfs2_control_write,
+	.owner   = THIS_MODULE,
+};
+
+struct miscdevice ocfs2_control_device = {
+	.minor		= MISC_DYNAMIC_MINOR,
+	.name		= "ocfs2_control",
+	.fops		= &ocfs2_control_fops,
+};
+
+static int ocfs2_control_init(void)
+{
+	int rc;
+
+	atomic_set(&ocfs2_control_opened, 0);
+
+	rc = misc_register(&ocfs2_control_device);
+	if (rc)
+		printk(KERN_ERR
+		       "ocfs2: Unable to register ocfs2_control device "
+		       "(errno %d)\n",
+		       -rc);
+
+	return rc;
+}
+
+static void ocfs2_control_exit(void)
+{
+	int rc;
+
+	rc = misc_deregister(&ocfs2_control_device);
+	if (rc)
+		printk(KERN_ERR
+		       "ocfs2: Unable to deregister ocfs2_control device "
+		       "(errno %d)\n",
+		       -rc);
+}
+
+static int __init user_stack_init(void)
+{
+	return ocfs2_control_init();
+}
+
 static void __exit user_stack_exit(void)
 {
+	ocfs2_control_exit();
 }
 
 MODULE_AUTHOR("Oracle");
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 28/62] ocfs2: Start the ocfs2_control handshake.
  2008-04-02 20:14                                                     ` [Ocfs2-devel] [PATCH 27/62] ocfs2: Add the ocfs2_control misc device Mark Fasheh
@ 2008-04-02 20:14                                                       ` Mark Fasheh
  2008-04-02 20:14                                                         ` [Ocfs2-devel] [PATCH 29/62] ocfs2: Introduce the DOWN message to ocfs2_control Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

When a control daemon opens the ocfs2_control device, it must perform a
handshake to tell the filesystem it is something capable of monitoring
cluster status.  Only after the handshake is complete will the filesystem
allow mounts.

This is the first part of the handshake.  The daemon reads all supported
ocfs2_control protocols, then writes in the protocol it will use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stack_user.c |  144 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 139 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index fdca5d3..ff8d307 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -22,6 +22,7 @@
 #include <linux/miscdevice.h>
 #include <linux/mutex.h>
 #include <linux/reboot.h>
+#include <asm/uaccess.h>
 
 #include "stackglue.h"
 
@@ -40,6 +41,16 @@
  */
 
 /*
+ * Whether or not the client has done the handshake.
+ * For now, we have just one protocol version.
+ */
+#define OCFS2_CONTROL_PROTO			"T01\n"
+#define OCFS2_CONTROL_PROTO_LEN			4
+#define OCFS2_CONTROL_HANDSHAKE_INVALID		(0)
+#define OCFS2_CONTROL_HANDSHAKE_READ		(1)
+#define OCFS2_CONTROL_HANDSHAKE_VALID		(2)
+
+/*
  * ocfs2_live_connection is refcounted because the filesystem and
  * miscdevice sides can detach in different order.  Let's just be safe.
  */
@@ -48,11 +59,30 @@ struct ocfs2_live_connection {
 	struct ocfs2_cluster_connection	*oc_conn;
 };
 
+struct ocfs2_control_private {
+	struct list_head op_list;
+	int op_state;
+};
+
 static atomic_t ocfs2_control_opened;
 
 static LIST_HEAD(ocfs2_live_connection_list);
+static LIST_HEAD(ocfs2_control_private_list);
 static DEFINE_MUTEX(ocfs2_control_lock);
 
+static inline void ocfs2_control_set_handshake_state(struct file *file,
+						     int state)
+{
+	struct ocfs2_control_private *p = file->private_data;
+	p->op_state = state;
+}
+
+static inline int ocfs2_control_get_handshake_state(struct file *file)
+{
+	struct ocfs2_control_private *p = file->private_data;
+	return p->op_state;
+}
+
 static struct ocfs2_live_connection *ocfs2_connection_find(const char *name)
 {
 	size_t len = strlen(name);
@@ -119,27 +149,115 @@ static void ocfs2_live_connection_drop(struct ocfs2_live_connection *c)
 	kfree(c);
 }
 
+static ssize_t ocfs2_control_cfu(char *target, size_t target_len,
+				 const char __user *buf, size_t count)
+{
+	/* The T01 expects write(2) calls to have exactly one command */
+	if (count != target_len)
+		return -EINVAL;
+
+	if (copy_from_user(target, buf, target_len))
+		return -EFAULT;
+
+	return count;
+}
+
+static ssize_t ocfs2_control_validate_handshake(struct file *file,
+						const char __user *buf,
+						size_t count)
+{
+	ssize_t ret;
+	char kbuf[OCFS2_CONTROL_PROTO_LEN];
+
+	ret = ocfs2_control_cfu(kbuf, OCFS2_CONTROL_PROTO_LEN,
+				buf, count);
+	if (ret != count)
+		return ret;
+
+	if (strncmp(kbuf, OCFS2_CONTROL_PROTO, OCFS2_CONTROL_PROTO_LEN))
+		return -EINVAL;
+
+	atomic_inc(&ocfs2_control_opened);
+	ocfs2_control_set_handshake_state(file,
+					  OCFS2_CONTROL_HANDSHAKE_VALID);
+
+
+	return count;
+}
+
 
 static ssize_t ocfs2_control_write(struct file *file,
 				   const char __user *buf,
 				   size_t count,
 				   loff_t *ppos)
 {
-	return 0;
+	ssize_t ret;
+
+	switch (ocfs2_control_get_handshake_state(file)) {
+		case OCFS2_CONTROL_HANDSHAKE_INVALID:
+			ret = -EINVAL;
+			break;
+
+		case OCFS2_CONTROL_HANDSHAKE_READ:
+			ret = ocfs2_control_validate_handshake(file, buf,
+							       count);
+			break;
+
+		case OCFS2_CONTROL_HANDSHAKE_VALID:
+			ret = count;  /* XXX */
+			break;
+
+		default:
+			BUG();
+			ret = -EIO;
+			break;
+	}
+
+	return ret;
 }
 
+/*
+ * This is a naive version.  If we ever have a new protocol, we'll expand
+ * it.  Probably using seq_file.
+ */
 static ssize_t ocfs2_control_read(struct file *file,
 				  char __user *buf,
 				  size_t count,
 				  loff_t *ppos)
 {
-	return 0;
+	char *proto_string = OCFS2_CONTROL_PROTO;
+	size_t to_write = 0;
+
+	if (*ppos >= OCFS2_CONTROL_PROTO_LEN)
+		return 0;
+
+	to_write = OCFS2_CONTROL_PROTO_LEN - *ppos;
+	if (to_write > count)
+		to_write = count;
+	if (copy_to_user(buf, proto_string + *ppos, to_write))
+		return -EFAULT;
+
+	*ppos += to_write;
+
+	/* Have we read the whole protocol list? */
+	if (*ppos >= OCFS2_CONTROL_PROTO_LEN)
+		ocfs2_control_set_handshake_state(file,
+						  OCFS2_CONTROL_HANDSHAKE_READ);
+
+	return to_write;
 }
 
 static int ocfs2_control_release(struct inode *inode, struct file *file)
 {
+	struct ocfs2_control_private *p = file->private_data;
+
+	mutex_lock(&ocfs2_control_lock);
+
+	if (ocfs2_control_get_handshake_state(file) !=
+	    OCFS2_CONTROL_HANDSHAKE_VALID)
+		goto out;
+
 	if (atomic_dec_and_test(&ocfs2_control_opened)) {
-		mutex_lock(&ocfs2_control_lock);
 		if (!list_empty(&ocfs2_live_connection_list)) {
 			/* XXX: Do bad things! */
 			printk(KERN_ERR
@@ -148,15 +266,31 @@ static int ocfs2_control_release(struct inode *inode, struct file *file)
 			       "an emergency restart!\n");
 			emergency_restart();
 		}
-		mutex_unlock(&ocfs2_control_lock);
 	}
 
+out:
+	list_del_init(&p->op_list);
+	file->private_data = NULL;
+
+	mutex_unlock(&ocfs2_control_lock);
+
+	kfree(p);
+
 	return 0;
 }
 
 static int ocfs2_control_open(struct inode *inode, struct file *file)
 {
-	atomic_inc(&ocfs2_control_opened);
+	struct ocfs2_control_private *p;
+
+	p = kzalloc(sizeof(struct ocfs2_control_private), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	mutex_lock(&ocfs2_control_lock);
+	file->private_data = p;
+	list_add(&p->op_list, &ocfs2_control_private_list);
+	mutex_unlock(&ocfs2_control_lock);
 
 	return 0;
 }
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 29/62] ocfs2: Introduce the DOWN message to ocfs2_control
  2008-04-02 20:14                                                       ` [Ocfs2-devel] [PATCH 28/62] ocfs2: Start the ocfs2_control handshake Mark Fasheh
@ 2008-04-02 20:14                                                         ` Mark Fasheh
  2008-04-02 20:14                                                           ` [Ocfs2-devel] [PATCH 30/62] ocfs2: Add the local node id to the handshake Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

When the control daemon sees a node go down, it sends a DOWN message
through the ocfs2_control device.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stack_user.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index ff8d307..a5e58e2 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -35,9 +35,21 @@
  * of output is a supported protocol tag.  All protocol tags are a single
  * character followed by a two hex digit version number.  Currently the
  * only things supported is T01, for "Text-base version 0x01".  Next, the
- * client writes the version they would like to use.  If the version tag
- * written is unknown, -EINVAL is returned.  Once the negotiation is
- * complete, the client can start sending messages.
+ * client writes the version they would like to use, including the newline.
+ * Thus, the protocol tag is 'T01\n'.  If the version tag written is
+ * unknown, -EINVAL is returned.  Once the negotiation is complete, the
+ * client can start sending messages.
+ *
+ * The T01 protocol only has one message, "DOWN".  It has the following
+ * syntax:
+ *
+ *  DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline>
+ *
+ * eg:
+ *
+ *  DOWN 632A924FDD844190BDA93C0DF6B94899 00000001\n
+ *
+ * This is 47 characters.
  */
 
 /*
@@ -49,6 +61,11 @@
 #define OCFS2_CONTROL_HANDSHAKE_INVALID		(0)
 #define OCFS2_CONTROL_HANDSHAKE_READ		(1)
 #define OCFS2_CONTROL_HANDSHAKE_VALID		(2)
+#define OCFS2_CONTROL_MESSAGE_DOWN		"DOWN"
+#define OCFS2_CONTROL_MESSAGE_DOWN_LEN		4
+#define OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN	47
+#define OCFS2_TEXT_UUID_LEN			32
+#define OCFS2_CONTROL_MESSAGE_NODENUM_LEN	8
 
 /*
  * ocfs2_live_connection is refcounted because the filesystem and
@@ -149,7 +166,7 @@ static void ocfs2_live_connection_drop(struct ocfs2_live_connection *c)
 	kfree(c);
 }
 
-static ssize_t ocfs2_control_cfu(char *target, size_t target_len,
+static ssize_t ocfs2_control_cfu(void *target, size_t target_len,
 				 const char __user *buf, size_t count)
 {
 	/* The T01 expects write(2) calls to have exactly one command */
@@ -185,6 +202,73 @@ static ssize_t ocfs2_control_validate_handshake(struct file *file,
 	return count;
 }
 
+static void ocfs2_control_send_down(const char *uuid,
+				    int nodenum)
+{
+	struct ocfs2_live_connection *c;
+
+	mutex_lock(&ocfs2_control_lock);
+
+	c = ocfs2_connection_find(uuid);
+	if (c) {
+		BUG_ON(c->oc_conn == NULL);
+		c->oc_conn->cc_recovery_handler(nodenum,
+						c->oc_conn->cc_recovery_data);
+	}
+
+	mutex_unlock(&ocfs2_control_lock);
+}
+
+/* DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline> */
+struct ocfs2_control_message_down {
+	char	tag[OCFS2_CONTROL_MESSAGE_DOWN_LEN];
+	char	space1;
+	char	uuid[OCFS2_TEXT_UUID_LEN];
+	char	space2;
+	char	nodestr[OCFS2_CONTROL_MESSAGE_NODENUM_LEN];
+	char	newline;
+};
+
+static ssize_t ocfs2_control_message(struct file *file,
+				     const char __user *buf,
+				     size_t count)
+{
+	ssize_t ret;
+	char *p = NULL;
+	long nodenum;
+	struct ocfs2_control_message_down msg;
+
+	/* Try to catch padding issues */
+	WARN_ON(offsetof(struct ocfs2_control_message_down, uuid) !=
+		(sizeof(msg.tag) + sizeof(msg.space1)));
+
+	memset(&msg, 0, sizeof(struct ocfs2_control_message_down));
+	ret = ocfs2_control_cfu(&msg, OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN,
+				buf, count);
+	if (ret != count)
+		return ret;
+
+	if (strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_DOWN,
+		    strlen(OCFS2_CONTROL_MESSAGE_DOWN)))
+		return -EINVAL;
+
+	if ((msg.space1 != ' ') || (msg.space2 != ' ') ||
+	    (msg.newline != '\n'))
+		return -EINVAL;
+	msg.space1 = msg.space2 = msg.newline = '\0';
+
+	nodenum = simple_strtol(msg.nodestr, &p, 16);
+	if (!p || *p)
+		return -EINVAL;
+
+	if ((nodenum == LONG_MIN) || (nodenum == LONG_MAX) ||
+	    (nodenum > INT_MAX) || (nodenum < 0))
+		return -ERANGE;
+
+	ocfs2_control_send_down(msg.uuid, nodenum);
+
+	return count;
+}
 
 static ssize_t ocfs2_control_write(struct file *file,
 				   const char __user *buf,
@@ -204,7 +288,7 @@ static ssize_t ocfs2_control_write(struct file *file,
 			break;
 
 		case OCFS2_CONTROL_HANDSHAKE_VALID:
-			ret = count;  /* XXX */
+			ret = ocfs2_control_message(file, buf, count);
 			break;
 
 		default:
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 30/62] ocfs2: Add the local node id to the handshake.
  2008-04-02 20:14                                                         ` [Ocfs2-devel] [PATCH 29/62] ocfs2: Introduce the DOWN message to ocfs2_control Mark Fasheh
@ 2008-04-02 20:14                                                           ` Mark Fasheh
  2008-04-02 20:14                                                             ` [Ocfs2-devel] [PATCH 31/62] ocfs2: Add the 'set version' message to the ocfs2_control device Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

This is the second part of the ocfs2_control handshake.  After
negotiating the ocfs2_control protocol, the daemon tells the filesystem
what the local node id is via the SETN message.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stack_user.c |  222 ++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 173 insertions(+), 49 deletions(-)

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index a5e58e2..43e6105 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -40,8 +40,18 @@
  * unknown, -EINVAL is returned.  Once the negotiation is complete, the
  * client can start sending messages.
  *
- * The T01 protocol only has one message, "DOWN".  It has the following
- * syntax:
+ * The T01 protocol only has two messages.  First is the "SETN" message.
+ * It has the following syntax:
+ *
+ *  SETN<space><8-char-hex-nodenum><newline>
+ *
+ * This is 14 characters.
+ *
+ * The "SETN" message must be the first message following the protocol.
+ * It tells ocfs2_control the local node number.
+ *
+ * Once the local node number has been set, the "DOWN" message can be
+ * sent for node down notification.  It has the following syntax:
  *
  *  DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline>
  *
@@ -58,11 +68,18 @@
  */
 #define OCFS2_CONTROL_PROTO			"T01\n"
 #define OCFS2_CONTROL_PROTO_LEN			4
+
+/* Handshake states */
 #define OCFS2_CONTROL_HANDSHAKE_INVALID		(0)
 #define OCFS2_CONTROL_HANDSHAKE_READ		(1)
-#define OCFS2_CONTROL_HANDSHAKE_VALID		(2)
-#define OCFS2_CONTROL_MESSAGE_DOWN		"DOWN"
-#define OCFS2_CONTROL_MESSAGE_DOWN_LEN		4
+#define OCFS2_CONTROL_HANDSHAKE_PROTOCOL	(2)
+#define OCFS2_CONTROL_HANDSHAKE_VALID		(3)
+
+/* Messages */
+#define OCFS2_CONTROL_MESSAGE_OP_LEN		4
+#define OCFS2_CONTROL_MESSAGE_SETNODE_OP	"SETN"
+#define OCFS2_CONTROL_MESSAGE_SETNODE_TOTAL_LEN	14
+#define OCFS2_CONTROL_MESSAGE_DOWN_OP		"DOWN"
 #define OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN	47
 #define OCFS2_TEXT_UUID_LEN			32
 #define OCFS2_CONTROL_MESSAGE_NODENUM_LEN	8
@@ -79,9 +96,35 @@ struct ocfs2_live_connection {
 struct ocfs2_control_private {
 	struct list_head op_list;
 	int op_state;
+	int op_this_node;
+};
+
+/* SETN<space><8-char-hex-nodenum><newline> */
+struct ocfs2_control_message_setn {
+	char	tag[OCFS2_CONTROL_MESSAGE_OP_LEN];
+	char	space;
+	char	nodestr[OCFS2_CONTROL_MESSAGE_NODENUM_LEN];
+	char	newline;
+};
+
+/* DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline> */
+struct ocfs2_control_message_down {
+	char	tag[OCFS2_CONTROL_MESSAGE_OP_LEN];
+	char	space1;
+	char	uuid[OCFS2_TEXT_UUID_LEN];
+	char	space2;
+	char	nodestr[OCFS2_CONTROL_MESSAGE_NODENUM_LEN];
+	char	newline;
+};
+
+union ocfs2_control_message {
+	char					tag[OCFS2_CONTROL_MESSAGE_OP_LEN];
+	struct ocfs2_control_message_setn	u_setn;
+	struct ocfs2_control_message_down	u_down;
 };
 
 static atomic_t ocfs2_control_opened;
+static int ocfs2_control_this_node = -1;
 
 static LIST_HEAD(ocfs2_live_connection_list);
 static LIST_HEAD(ocfs2_control_private_list);
@@ -166,38 +209,37 @@ static void ocfs2_live_connection_drop(struct ocfs2_live_connection *c)
 	kfree(c);
 }
 
-static ssize_t ocfs2_control_cfu(void *target, size_t target_len,
-				 const char __user *buf, size_t count)
+static int ocfs2_control_cfu(void *target, size_t target_len,
+			     const char __user *buf, size_t count)
 {
 	/* The T01 expects write(2) calls to have exactly one command */
-	if (count != target_len)
+	if ((count != target_len) ||
+	    (count > sizeof(union ocfs2_control_message)))
 		return -EINVAL;
 
 	if (copy_from_user(target, buf, target_len))
 		return -EFAULT;
 
-	return count;
+	return 0;
 }
 
-static ssize_t ocfs2_control_validate_handshake(struct file *file,
-						const char __user *buf,
-						size_t count)
+static ssize_t ocfs2_control_validate_protocol(struct file *file,
+					       const char __user *buf,
+					       size_t count)
 {
 	ssize_t ret;
 	char kbuf[OCFS2_CONTROL_PROTO_LEN];
 
 	ret = ocfs2_control_cfu(kbuf, OCFS2_CONTROL_PROTO_LEN,
 				buf, count);
-	if (ret != count)
+	if (ret)
 		return ret;
 
 	if (strncmp(kbuf, OCFS2_CONTROL_PROTO, OCFS2_CONTROL_PROTO_LEN))
 		return -EINVAL;
 
-	atomic_inc(&ocfs2_control_opened);
 	ocfs2_control_set_handshake_state(file,
-					  OCFS2_CONTROL_HANDSHAKE_VALID);
-
+					  OCFS2_CONTROL_HANDSHAKE_PROTOCOL);
 
 	return count;
 }
@@ -219,45 +261,92 @@ static void ocfs2_control_send_down(const char *uuid,
 	mutex_unlock(&ocfs2_control_lock);
 }
 
-/* DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline> */
-struct ocfs2_control_message_down {
-	char	tag[OCFS2_CONTROL_MESSAGE_DOWN_LEN];
-	char	space1;
-	char	uuid[OCFS2_TEXT_UUID_LEN];
-	char	space2;
-	char	nodestr[OCFS2_CONTROL_MESSAGE_NODENUM_LEN];
-	char	newline;
-};
+/*
+ * Called whenever configuration elements are sent to /dev/ocfs2_control.
+ * If all configuration elements are present, try to set the global
+ * values.  If not, return -EAGAIN.  If there is a problem, return a
+ * different error.
+ */
+static int ocfs2_control_install_private(struct file *file)
+{
+	int rc = 0;
+	int set_p = 1;
+	struct ocfs2_control_private *p = file->private_data;
 
-static ssize_t ocfs2_control_message(struct file *file,
-				     const char __user *buf,
-				     size_t count)
+	BUG_ON(p->op_state != OCFS2_CONTROL_HANDSHAKE_PROTOCOL);
+
+	if (p->op_this_node < 0)
+		set_p = 0;
+
+	mutex_lock(&ocfs2_control_lock);
+	if (ocfs2_control_this_node < 0) {
+		if (set_p)
+			ocfs2_control_this_node = p->op_this_node;
+	} else if (ocfs2_control_this_node != p->op_this_node)
+		rc = -EINVAL;
+	mutex_unlock(&ocfs2_control_lock);
+
+	if (!rc && set_p) {
+		/* We set the global values successfully */
+		atomic_inc(&ocfs2_control_opened);
+		ocfs2_control_set_handshake_state(file,
+					OCFS2_CONTROL_HANDSHAKE_VALID);
+	}
+
+	return rc;
+}
+
+static int ocfs2_control_do_setnode_msg(struct file *file,
+					struct ocfs2_control_message_setn *msg)
 {
-	ssize_t ret;
-	char *p = NULL;
 	long nodenum;
-	struct ocfs2_control_message_down msg;
+	char *ptr = NULL;
+	struct ocfs2_control_private *p = file->private_data;
 
-	/* Try to catch padding issues */
-	WARN_ON(offsetof(struct ocfs2_control_message_down, uuid) !=
-		(sizeof(msg.tag) + sizeof(msg.space1)));
+	if (ocfs2_control_get_handshake_state(file) !=
+	    OCFS2_CONTROL_HANDSHAKE_PROTOCOL)
+		return -EINVAL;
 
-	memset(&msg, 0, sizeof(struct ocfs2_control_message_down));
-	ret = ocfs2_control_cfu(&msg, OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN,
-				buf, count);
-	if (ret != count)
-		return ret;
+	if (strncmp(msg->tag, OCFS2_CONTROL_MESSAGE_SETNODE_OP,
+		    OCFS2_CONTROL_MESSAGE_OP_LEN))
+		return -EINVAL;
+
+	if ((msg->space != ' ') || (msg->newline != '\n'))
+		return -EINVAL;
+	msg->space = msg->newline = '\0';
+
+	nodenum = simple_strtol(msg->nodestr, &ptr, 16);
+	if (!ptr || *ptr)
+		return -EINVAL;
+
+	if ((nodenum == LONG_MIN) || (nodenum == LONG_MAX) ||
+	    (nodenum > INT_MAX) || (nodenum < 0))
+		return -ERANGE;
+	p->op_this_node = nodenum;
+
+	return ocfs2_control_install_private(file);
+}
+
+static int ocfs2_control_do_down_msg(struct file *file,
+				     struct ocfs2_control_message_down *msg)
+{
+	long nodenum;
+	char *p = NULL;
+
+	if (ocfs2_control_get_handshake_state(file) !=
+	    OCFS2_CONTROL_HANDSHAKE_VALID)
+		return -EINVAL;
 
-	if (strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_DOWN,
-		    strlen(OCFS2_CONTROL_MESSAGE_DOWN)))
+	if (strncmp(msg->tag, OCFS2_CONTROL_MESSAGE_DOWN_OP,
+		    OCFS2_CONTROL_MESSAGE_OP_LEN))
 		return -EINVAL;
 
-	if ((msg.space1 != ' ') || (msg.space2 != ' ') ||
-	    (msg.newline != '\n'))
+	if ((msg->space1 != ' ') || (msg->space2 != ' ') ||
+	    (msg->newline != '\n'))
 		return -EINVAL;
-	msg.space1 = msg.space2 = msg.newline = '\0';
+	msg->space1 = msg->space2 = msg->newline = '\0';
 
-	nodenum = simple_strtol(msg.nodestr, &p, 16);
+	nodenum = simple_strtol(msg->nodestr, &p, 16);
 	if (!p || *p)
 		return -EINVAL;
 
@@ -265,9 +354,40 @@ static ssize_t ocfs2_control_message(struct file *file,
 	    (nodenum > INT_MAX) || (nodenum < 0))
 		return -ERANGE;
 
-	ocfs2_control_send_down(msg.uuid, nodenum);
+	ocfs2_control_send_down(msg->uuid, nodenum);
 
-	return count;
+	return 0;
+}
+
+static ssize_t ocfs2_control_message(struct file *file,
+				     const char __user *buf,
+				     size_t count)
+{
+	ssize_t ret;
+	union ocfs2_control_message msg;
+
+	/* Try to catch padding issues */
+	WARN_ON(offsetof(struct ocfs2_control_message_down, uuid) !=
+		(sizeof(msg.u_down.tag) + sizeof(msg.u_down.space1)));
+
+	memset(&msg, 0, sizeof(union ocfs2_control_message));
+	ret = ocfs2_control_cfu(&msg, count, buf, count);
+	if (ret)
+		goto out;
+
+	if ((count == OCFS2_CONTROL_MESSAGE_SETNODE_TOTAL_LEN) &&
+	    !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_SETNODE_OP,
+		     OCFS2_CONTROL_MESSAGE_OP_LEN))
+		ret = ocfs2_control_do_setnode_msg(file, &msg.u_setn);
+	else if ((count == OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN) &&
+		 !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_DOWN_OP,
+			  OCFS2_CONTROL_MESSAGE_OP_LEN))
+		ret = ocfs2_control_do_down_msg(file, &msg.u_down);
+	else
+		ret = -EINVAL;
+
+out:
+	return ret ? ret : count;
 }
 
 static ssize_t ocfs2_control_write(struct file *file,
@@ -283,10 +403,11 @@ static ssize_t ocfs2_control_write(struct file *file,
 			break;
 
 		case OCFS2_CONTROL_HANDSHAKE_READ:
-			ret = ocfs2_control_validate_handshake(file, buf,
-							       count);
+			ret = ocfs2_control_validate_protocol(file, buf,
+							      count);
 			break;
 
+		case OCFS2_CONTROL_HANDSHAKE_PROTOCOL:
 		case OCFS2_CONTROL_HANDSHAKE_VALID:
 			ret = ocfs2_control_message(file, buf, count);
 			break;
@@ -350,6 +471,8 @@ static int ocfs2_control_release(struct inode *inode, struct file *file)
 			       "an emergency restart!\n");
 			emergency_restart();
 		}
+		/* Last valid close clears the node number */
+		ocfs2_control_this_node = -1;
 	}
 
 out:
@@ -370,6 +493,7 @@ static int ocfs2_control_open(struct inode *inode, struct file *file)
 	p = kzalloc(sizeof(struct ocfs2_control_private), GFP_KERNEL);
 	if (!p)
 		return -ENOMEM;
+	p->op_this_node = -1;
 
 	mutex_lock(&ocfs2_control_lock);
 	file->private_data = p;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 31/62] ocfs2: Add the 'set version' message to the ocfs2_control device.
  2008-04-02 20:14                                                           ` [Ocfs2-devel] [PATCH 30/62] ocfs2: Add the local node id to the handshake Mark Fasheh
@ 2008-04-02 20:14                                                             ` Mark Fasheh
  2008-04-02 20:14                                                               ` [Ocfs2-devel] [PATCH 32/62] ocfs2: add fsdlm to stackglue Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The "SETV" message sets the filesystem locking protocol version as
negotiated by the client.  The client negotiates based on the maximum
version advertised in /sys/fs/ocfs2/max_locking_protocol.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stack_user.c |  131 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 119 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index 43e6105..de982c1 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -40,7 +40,7 @@
  * unknown, -EINVAL is returned.  Once the negotiation is complete, the
  * client can start sending messages.
  *
- * The T01 protocol only has two messages.  First is the "SETN" message.
+ * The T01 protocol has three messages.  First is the "SETN" message.
  * It has the following syntax:
  *
  *  SETN<space><8-char-hex-nodenum><newline>
@@ -50,8 +50,22 @@
  * The "SETN" message must be the first message following the protocol.
  * It tells ocfs2_control the local node number.
  *
- * Once the local node number has been set, the "DOWN" message can be
- * sent for node down notification.  It has the following syntax:
+ * Next comes the "SETV" message.  It has the following syntax:
+ *
+ *  SETV<space><2-char-hex-major><space><2-char-hex-minor><newline>
+ *
+ * This is 11 characters.
+ *
+ * The "SETV" message sets the filesystem locking protocol version as
+ * negotiated by the client.  The client negotiates based on the maximum
+ * version advertised in /sys/fs/ocfs2/max_locking_protocol.  The major
+ * number from the "SETV" message must match
+ * user_stack.sp_proto->lp_max_version.pv_major, and the minor number
+ * must be less than or equal to ...->lp_max_version.pv_minor.
+ *
+ * Once this information has been set, mounts will be allowed.  From this
+ * point on, the "DOWN" message can be sent for node down notification.
+ * It has the following syntax:
  *
  *  DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline>
  *
@@ -79,9 +93,12 @@
 #define OCFS2_CONTROL_MESSAGE_OP_LEN		4
 #define OCFS2_CONTROL_MESSAGE_SETNODE_OP	"SETN"
 #define OCFS2_CONTROL_MESSAGE_SETNODE_TOTAL_LEN	14
+#define OCFS2_CONTROL_MESSAGE_SETVERSION_OP	"SETV"
+#define OCFS2_CONTROL_MESSAGE_SETVERSION_TOTAL_LEN	11
 #define OCFS2_CONTROL_MESSAGE_DOWN_OP		"DOWN"
 #define OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN	47
 #define OCFS2_TEXT_UUID_LEN			32
+#define OCFS2_CONTROL_MESSAGE_VERNUM_LEN	2
 #define OCFS2_CONTROL_MESSAGE_NODENUM_LEN	8
 
 /*
@@ -97,6 +114,7 @@ struct ocfs2_control_private {
 	struct list_head op_list;
 	int op_state;
 	int op_this_node;
+	struct ocfs2_protocol_version op_proto;
 };
 
 /* SETN<space><8-char-hex-nodenum><newline> */
@@ -107,6 +125,16 @@ struct ocfs2_control_message_setn {
 	char	newline;
 };
 
+/* SETV<space><2-char-hex-major><space><2-char-hex-minor><newline> */
+struct ocfs2_control_message_setv {
+	char	tag[OCFS2_CONTROL_MESSAGE_OP_LEN];
+	char	space1;
+	char	major[OCFS2_CONTROL_MESSAGE_VERNUM_LEN];
+	char	space2;
+	char	minor[OCFS2_CONTROL_MESSAGE_VERNUM_LEN];
+	char	newline;
+};
+
 /* DOWN<space><32-char-cap-hex-uuid><space><8-char-hex-nodenum><newline> */
 struct ocfs2_control_message_down {
 	char	tag[OCFS2_CONTROL_MESSAGE_OP_LEN];
@@ -120,11 +148,13 @@ struct ocfs2_control_message_down {
 union ocfs2_control_message {
 	char					tag[OCFS2_CONTROL_MESSAGE_OP_LEN];
 	struct ocfs2_control_message_setn	u_setn;
+	struct ocfs2_control_message_setv	u_setv;
 	struct ocfs2_control_message_down	u_down;
 };
 
 static atomic_t ocfs2_control_opened;
 static int ocfs2_control_this_node = -1;
+static struct ocfs2_protocol_version running_proto;
 
 static LIST_HEAD(ocfs2_live_connection_list);
 static LIST_HEAD(ocfs2_control_private_list);
@@ -264,8 +294,9 @@ static void ocfs2_control_send_down(const char *uuid,
 /*
  * Called whenever configuration elements are sent to /dev/ocfs2_control.
  * If all configuration elements are present, try to set the global
- * values.  If not, return -EAGAIN.  If there is a problem, return a
- * different error.
+ * values.  If there is a problem, return an error.  Skip any missing
+ * elements, and only bump ocfs2_control_opened when we have all elements
+ * and are successful.
  */
 static int ocfs2_control_install_private(struct file *file)
 {
@@ -275,15 +306,32 @@ static int ocfs2_control_install_private(struct file *file)
 
 	BUG_ON(p->op_state != OCFS2_CONTROL_HANDSHAKE_PROTOCOL);
 
-	if (p->op_this_node < 0)
+	mutex_lock(&ocfs2_control_lock);
+
+	if (p->op_this_node < 0) {
 		set_p = 0;
+	} else if ((ocfs2_control_this_node >= 0) &&
+		   (ocfs2_control_this_node != p->op_this_node)) {
+		rc = -EINVAL;
+		goto out_unlock;
+	}
 
-	mutex_lock(&ocfs2_control_lock);
-	if (ocfs2_control_this_node < 0) {
-		if (set_p)
-			ocfs2_control_this_node = p->op_this_node;
-	} else if (ocfs2_control_this_node != p->op_this_node)
+	if (!p->op_proto.pv_major) {
+		set_p = 0;
+	} else if (!list_empty(&ocfs2_live_connection_list) &&
+		   ((running_proto.pv_major != p->op_proto.pv_major) ||
+		    (running_proto.pv_minor != p->op_proto.pv_minor))) {
 		rc = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (set_p) {
+		ocfs2_control_this_node = p->op_this_node;
+		running_proto.pv_major = p->op_proto.pv_major;
+		running_proto.pv_minor = p->op_proto.pv_minor;
+	}
+
+out_unlock:
 	mutex_unlock(&ocfs2_control_lock);
 
 	if (!rc && set_p) {
@@ -327,6 +375,56 @@ static int ocfs2_control_do_setnode_msg(struct file *file,
 	return ocfs2_control_install_private(file);
 }
 
+static int ocfs2_control_do_setversion_msg(struct file *file,
+					   struct ocfs2_control_message_setv *msg)
+ {
+	long major, minor;
+	char *ptr = NULL;
+	struct ocfs2_control_private *p = file->private_data;
+	struct ocfs2_protocol_version *max =
+		&user_stack.sp_proto->lp_max_version;
+
+	if (ocfs2_control_get_handshake_state(file) !=
+	    OCFS2_CONTROL_HANDSHAKE_PROTOCOL)
+		return -EINVAL;
+
+	if (strncmp(msg->tag, OCFS2_CONTROL_MESSAGE_SETVERSION_OP,
+		    OCFS2_CONTROL_MESSAGE_OP_LEN))
+		return -EINVAL;
+
+	if ((msg->space1 != ' ') || (msg->space2 != ' ') ||
+	    (msg->newline != '\n'))
+		return -EINVAL;
+	msg->space1 = msg->space2 = msg->newline = '\0';
+
+	major = simple_strtol(msg->major, &ptr, 16);
+	if (!ptr || *ptr)
+		return -EINVAL;
+	minor = simple_strtol(msg->minor, &ptr, 16);
+	if (!ptr || *ptr)
+		return -EINVAL;
+
+	/*
+	 * The major must be between 1 and 255, inclusive.  The minor
+	 * must be between 0 and 255, inclusive.  The version passed in
+	 * must be within the maximum version supported by the filesystem.
+	 */
+	if ((major == LONG_MIN) || (major == LONG_MAX) ||
+	    (major > (u8)-1) || (major < 1))
+		return -ERANGE;
+	if ((minor == LONG_MIN) || (minor == LONG_MAX) ||
+	    (minor > (u8)-1) || (minor < 0))
+		return -ERANGE;
+	if ((major != max->pv_major) ||
+	    (minor > max->pv_minor))
+		return -EINVAL;
+
+	p->op_proto.pv_major = major;
+	p->op_proto.pv_minor = minor;
+
+	return ocfs2_control_install_private(file);
+}
+
 static int ocfs2_control_do_down_msg(struct file *file,
 				     struct ocfs2_control_message_down *msg)
 {
@@ -379,6 +477,10 @@ static ssize_t ocfs2_control_message(struct file *file,
 	    !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_SETNODE_OP,
 		     OCFS2_CONTROL_MESSAGE_OP_LEN))
 		ret = ocfs2_control_do_setnode_msg(file, &msg.u_setn);
+	else if ((count == OCFS2_CONTROL_MESSAGE_SETVERSION_TOTAL_LEN) &&
+		 !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_SETVERSION_OP,
+			  OCFS2_CONTROL_MESSAGE_OP_LEN))
+		ret = ocfs2_control_do_setversion_msg(file, &msg.u_setv);
 	else if ((count == OCFS2_CONTROL_MESSAGE_DOWN_TOTAL_LEN) &&
 		 !strncmp(msg.tag, OCFS2_CONTROL_MESSAGE_DOWN_OP,
 			  OCFS2_CONTROL_MESSAGE_OP_LEN))
@@ -471,8 +573,13 @@ static int ocfs2_control_release(struct inode *inode, struct file *file)
 			       "an emergency restart!\n");
 			emergency_restart();
 		}
-		/* Last valid close clears the node number */
+		/*
+		 * Last valid close clears the node number and resets
+		 * the locking protocol version
+		 */
 		ocfs2_control_this_node = -1;
+		running_proto.pv_major = 0;
+		running_proto.pv_major = 0;
 	}
 
 out:
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 32/62] ocfs2: add fsdlm to stackglue
  2008-04-02 20:14                                                             ` [Ocfs2-devel] [PATCH 31/62] ocfs2: Add the 'set version' message to the ocfs2_control device Mark Fasheh
@ 2008-04-02 20:14                                                               ` Mark Fasheh
  2008-04-02 20:14                                                                 ` [Ocfs2-devel] [PATCH 33/62] ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, David Teigland

From: David Teigland <teigland@redhat.com>

Add code to use fs/dlm.

[ Modified to be part of the stack_user module -- Joel ]

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/stack_user.c |  216 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/ocfs2/stackglue.c  |   14 +++-
 fs/ocfs2/stackglue.h  |   19 ++++-
 3 files changed, 243 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index de982c1..7428663 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -24,6 +24,7 @@
 #include <linux/reboot.h>
 #include <asm/uaccess.h>
 
+#include "ocfs2.h"  /* For struct ocfs2_lock_res */
 #include "stackglue.h"
 
 
@@ -152,6 +153,8 @@ union ocfs2_control_message {
 	struct ocfs2_control_message_down	u_down;
 };
 
+static struct ocfs2_stack_plugin user_stack;
+
 static atomic_t ocfs2_control_opened;
 static int ocfs2_control_this_node = -1;
 static struct ocfs2_protocol_version running_proto;
@@ -344,6 +347,20 @@ out_unlock:
 	return rc;
 }
 
+static int ocfs2_control_get_this_node(void)
+{
+	int rc;
+
+	mutex_lock(&ocfs2_control_lock);
+	if (ocfs2_control_this_node < 0)
+		rc = -EINVAL;
+	else
+		rc = ocfs2_control_this_node;
+	mutex_unlock(&ocfs2_control_lock);
+
+	return rc;
+}
+
 static int ocfs2_control_do_setnode_msg(struct file *file,
 					struct ocfs2_control_message_setn *msg)
 {
@@ -652,13 +669,210 @@ static void ocfs2_control_exit(void)
 		       -rc);
 }
 
+static struct dlm_lksb *fsdlm_astarg_to_lksb(void *astarg)
+{
+	struct ocfs2_lock_res *res = astarg;
+	return &res->l_lksb.lksb_fsdlm;
+}
+
+static void fsdlm_lock_ast_wrapper(void *astarg)
+{
+	struct dlm_lksb *lksb = fsdlm_astarg_to_lksb(astarg);
+	int status = lksb->sb_status;
+
+	BUG_ON(user_stack.sp_proto == NULL);
+
+	/*
+	 * For now we're punting on the issue of other non-standard errors
+	 * where we can't tell if the unlock_ast or lock_ast should be called.
+	 * The main "other error" that's possible is EINVAL which means the
+	 * function was called with invalid args, which shouldn't be possible
+	 * since the caller here is under our control.  Other non-standard
+	 * errors probably fall into the same category, or otherwise are fatal
+	 * which means we can't carry on anyway.
+	 */
+
+	if (status == -DLM_EUNLOCK || status == -DLM_ECANCEL)
+		user_stack.sp_proto->lp_unlock_ast(astarg, 0);
+	else
+		user_stack.sp_proto->lp_lock_ast(astarg);
+}
+
+static void fsdlm_blocking_ast_wrapper(void *astarg, int level)
+{
+	BUG_ON(user_stack.sp_proto == NULL);
+
+	user_stack.sp_proto->lp_blocking_ast(astarg, level);
+}
+
+static int user_dlm_lock(struct ocfs2_cluster_connection *conn,
+			 int mode,
+			 union ocfs2_dlm_lksb *lksb,
+			 u32 flags,
+			 void *name,
+			 unsigned int namelen,
+			 void *astarg)
+{
+	int ret;
+
+	if (!lksb->lksb_fsdlm.sb_lvbptr)
+		lksb->lksb_fsdlm.sb_lvbptr = (char *)lksb +
+					     sizeof(struct dlm_lksb);
+
+	ret = dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm,
+		       flags|DLM_LKF_NODLCKWT, name, namelen, 0,
+		       fsdlm_lock_ast_wrapper, astarg,
+		       fsdlm_blocking_ast_wrapper);
+	return ret;
+}
+
+static int user_dlm_unlock(struct ocfs2_cluster_connection *conn,
+			   union ocfs2_dlm_lksb *lksb,
+			   u32 flags,
+			   void *astarg)
+{
+	int ret;
+
+	ret = dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid,
+			 flags, &lksb->lksb_fsdlm, astarg);
+	return ret;
+}
+
+static int user_dlm_lock_status(union ocfs2_dlm_lksb *lksb)
+{
+	return lksb->lksb_fsdlm.sb_status;
+}
+
+static void *user_dlm_lvb(union ocfs2_dlm_lksb *lksb)
+{
+	return (void *)(lksb->lksb_fsdlm.sb_lvbptr);
+}
+
+static void user_dlm_dump_lksb(union ocfs2_dlm_lksb *lksb)
+{
+}
+
+/*
+ * Compare a requested locking protocol version against the current one.
+ *
+ * If the major numbers are different, they are incompatible.
+ * If the current minor is greater than the request, they are incompatible.
+ * If the current minor is less than or equal to the request, they are
+ * compatible, and the requester should run at the current minor version.
+ */
+static int fs_protocol_compare(struct ocfs2_protocol_version *existing,
+			       struct ocfs2_protocol_version *request)
+{
+	if (existing->pv_major != request->pv_major)
+		return 1;
+
+	if (existing->pv_minor > request->pv_minor)
+		return 1;
+
+	if (existing->pv_minor < request->pv_minor)
+		request->pv_minor = existing->pv_minor;
+
+	return 0;
+}
+
+static int user_cluster_connect(struct ocfs2_cluster_connection *conn)
+{
+	dlm_lockspace_t *fsdlm;
+	struct ocfs2_live_connection *control;
+	int rc = 0;
+
+	BUG_ON(conn == NULL);
+
+	rc = ocfs2_live_connection_new(conn, &control);
+	if (rc)
+		goto out;
+
+	/*
+	 * running_proto must have been set before we allowed any mounts
+	 * to proceed.
+	 */
+	if (fs_protocol_compare(&running_proto, &conn->cc_version)) {
+		printk(KERN_ERR
+		       "Unable to mount with fs locking protocol version "
+		       "%u.%u because the userspace control daemon has "
+		       "negotiated %u.%u\n",
+		       conn->cc_version.pv_major, conn->cc_version.pv_minor,
+		       running_proto.pv_major, running_proto.pv_minor);
+		rc = -EPROTO;
+		ocfs2_live_connection_drop(control);
+		goto out;
+	}
+
+	rc = dlm_new_lockspace(conn->cc_name, strlen(conn->cc_name),
+			       &fsdlm, DLM_LSFL_FS, DLM_LVB_LEN);
+	if (rc) {
+		ocfs2_live_connection_drop(control);
+		goto out;
+	}
+
+	conn->cc_private = control;
+	conn->cc_lockspace = fsdlm;
+out:
+	return rc;
+}
+
+static int user_cluster_disconnect(struct ocfs2_cluster_connection *conn,
+				   int hangup_pending)
+{
+	dlm_release_lockspace(conn->cc_lockspace, 2);
+	conn->cc_lockspace = NULL;
+	ocfs2_live_connection_drop(conn->cc_private);
+	conn->cc_private = NULL;
+	return 0;
+}
+
+static int user_cluster_this_node(unsigned int *this_node)
+{
+	int rc;
+
+	rc = ocfs2_control_get_this_node();
+	if (rc < 0)
+		return rc;
+
+	*this_node = rc;
+	return 0;
+}
+
+static struct ocfs2_stack_operations user_stack_ops = {
+	.connect	= user_cluster_connect,
+	.disconnect	= user_cluster_disconnect,
+	.this_node	= user_cluster_this_node,
+	.dlm_lock	= user_dlm_lock,
+	.dlm_unlock	= user_dlm_unlock,
+	.lock_status	= user_dlm_lock_status,
+	.lock_lvb	= user_dlm_lvb,
+	.dump_lksb	= user_dlm_dump_lksb,
+};
+
+static struct ocfs2_stack_plugin user_stack = {
+	.sp_name	= "user",
+	.sp_ops		= &user_stack_ops,
+	.sp_owner	= THIS_MODULE,
+};
+
+
 static int __init user_stack_init(void)
 {
-	return ocfs2_control_init();
+	int rc;
+
+	rc = ocfs2_control_init();
+	if (!rc) {
+		rc = ocfs2_stack_glue_register(&user_stack);
+		if (rc)
+			ocfs2_control_exit();
+	}
+
+	return rc;
 }
 
 static void __exit user_stack_exit(void)
 {
+	ocfs2_stack_glue_unregister(&user_stack);
 	ocfs2_control_exit();
 }
 
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index bf45d9b..119f60c 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -228,13 +228,20 @@ void ocfs2_stack_glue_set_locking_protocol(struct ocfs2_locking_protocol *proto)
 EXPORT_SYMBOL_GPL(ocfs2_stack_glue_set_locking_protocol);
 
 
+/*
+ * The ocfs2_dlm_lock() and ocfs2_dlm_unlock() functions take
+ * "struct ocfs2_lock_res *astarg" instead of "void *astarg" because the
+ * underlying stack plugins need to pilfer the lksb off of the lock_res.
+ * If some other structure needs to be passed as an astarg, the plugins
+ * will need to be given a different avenue to the lksb.
+ */
 int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   int mode,
 		   union ocfs2_dlm_lksb *lksb,
 		   u32 flags,
 		   void *name,
 		   unsigned int namelen,
-		   void *astarg)
+		   struct ocfs2_lock_res *astarg)
 {
 	BUG_ON(lproto == NULL);
 
@@ -246,7 +253,7 @@ EXPORT_SYMBOL_GPL(ocfs2_dlm_lock);
 int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
 		     union ocfs2_dlm_lksb *lksb,
 		     u32 flags,
-		     void *astarg)
+		     struct ocfs2_lock_res *astarg)
 {
 	BUG_ON(lproto == NULL);
 
@@ -360,7 +367,8 @@ void ocfs2_cluster_hangup(const char *group, int grouplen)
 	BUG_ON(group == NULL);
 	BUG_ON(group[grouplen] != '\0');
 
-	active_stack->sp_ops->hangup(group, grouplen);
+	if (active_stack->sp_ops->hangup)
+		active_stack->sp_ops->hangup(group, grouplen);
 
 	/* cluster_disconnect() was called with hangup_pending==1 */
 	ocfs2_stack_driver_put();
diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
index d88bc65..005e4f1 100644
--- a/fs/ocfs2/stackglue.h
+++ b/fs/ocfs2/stackglue.h
@@ -26,6 +26,7 @@
 #include <linux/dlmconstants.h>
 
 #include "dlm/dlmapi.h"
+#include <linux/dlm.h>
 
 /*
  * dlmconstants.h does not have a LOCAL flag.  We hope to remove it
@@ -60,6 +61,17 @@ struct ocfs2_locking_protocol {
 	void (*lp_unlock_ast)(void *astarg, int error);
 };
 
+
+/*
+ * The dlm_lockstatus struct includes lvb space, but the dlm_lksb struct only
+ * has a pointer to separately allocated lvb space.  This struct exists only to
+ * include in the lksb union to make space for a combined dlm_lksb and lvb.
+ */
+struct fsdlm_lksb_plus_lvb {
+	struct dlm_lksb lksb;
+	char lvb[DLM_LVB_LEN];
+};
+
 /*
  * A union of all lock status structures.  We define it here so that the
  * size of the union is known.  Lock status structures are embedded in
@@ -67,6 +79,8 @@ struct ocfs2_locking_protocol {
  */
 union ocfs2_dlm_lksb {
 	struct dlm_lockstatus lksb_o2dlm;
+	struct dlm_lksb lksb_fsdlm;
+	struct fsdlm_lksb_plus_lvb padding;
 };
 
 /*
@@ -221,17 +235,18 @@ int ocfs2_cluster_disconnect(struct ocfs2_cluster_connection *conn,
 void ocfs2_cluster_hangup(const char *group, int grouplen);
 int ocfs2_cluster_this_node(unsigned int *node);
 
+struct ocfs2_lock_res;
 int ocfs2_dlm_lock(struct ocfs2_cluster_connection *conn,
 		   int mode,
 		   union ocfs2_dlm_lksb *lksb,
 		   u32 flags,
 		   void *name,
 		   unsigned int namelen,
-		   void *astarg);
+		   struct ocfs2_lock_res *astarg);
 int ocfs2_dlm_unlock(struct ocfs2_cluster_connection *conn,
 		     union ocfs2_dlm_lksb *lksb,
 		     u32 flags,
-		     void *astarg);
+		     struct ocfs2_lock_res *astarg);
 
 int ocfs2_dlm_lock_status(union ocfs2_dlm_lksb *lksb);
 void *ocfs2_dlm_lvb(union ocfs2_dlm_lksb *lksb);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 33/62] ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h
  2008-04-02 20:14                                                               ` [Ocfs2-devel] [PATCH 32/62] ocfs2: add fsdlm to stackglue Mark Fasheh
@ 2008-04-02 20:14                                                                 ` Mark Fasheh
  2008-04-02 20:14                                                                   ` [Ocfs2-devel] [PATCH 34/62] ocfs2: Add kbuild for ocfs2_stack_user.ko Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

The masklog code is in the o2cb stack, but ocfs2_lockid.h now needs to
be included by the user stack.  The BUG() in ocfs2_lock_type_string()
does not need masklog support, so change it to a regular BUG_ON().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/ocfs2_lockid.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/ocfs2_lockid.h b/fs/ocfs2/ocfs2_lockid.h
index 86f3e37..82c200f 100644
--- a/fs/ocfs2/ocfs2_lockid.h
+++ b/fs/ocfs2/ocfs2_lockid.h
@@ -100,7 +100,7 @@ static char *ocfs2_lock_type_strings[] = {
 static inline const char *ocfs2_lock_type_string(enum ocfs2_lock_type type)
 {
 #ifdef __KERNEL__
-	mlog_bug_on_msg(type >= OCFS2_NUM_LOCK_TYPES, "%d\n", type);
+	BUG_ON(type >= OCFS2_NUM_LOCK_TYPES);
 #endif
 	return ocfs2_lock_type_strings[type];
 }
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 34/62] ocfs2: Add kbuild for ocfs2_stack_user.ko
  2008-04-02 20:14                                                                 ` [Ocfs2-devel] [PATCH 33/62] ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h Mark Fasheh
@ 2008-04-02 20:14                                                                   ` Mark Fasheh
  2008-04-02 20:14                                                                     ` [Ocfs2-devel] [PATCH 35/62] ocfs2: Allow selection of cluster plug-ins Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Add ocfs2_stack_user.ko to the Makefile so that it builds.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/Makefile |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index b734254..b8d6d02 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -2,7 +2,11 @@ EXTRA_CFLAGS += -Ifs/ocfs2
 
 EXTRA_CFLAGS += -DCATCH_BH_JBD_RACES
 
-obj-$(CONFIG_OCFS2_FS) += ocfs2.o ocfs2_stackglue.o ocfs2_stack_o2cb.o
+obj-$(CONFIG_OCFS2_FS) += 	\
+	ocfs2.o			\
+	ocfs2_stackglue.o	\
+	ocfs2_stack_o2cb.o	\
+	ocfs2_stack_user.o
 
 ocfs2-objs := \
 	alloc.o 		\
@@ -33,6 +37,7 @@ ocfs2-objs := \
 
 ocfs2_stackglue-objs := stackglue.o
 ocfs2_stack_o2cb-objs := stack_o2cb.o
+ocfs2_stack_user-objs := stack_user.o
 
 obj-$(CONFIG_OCFS2_FS) += cluster/
 obj-$(CONFIG_OCFS2_FS) += dlm/
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 35/62] ocfs2: Allow selection of cluster plug-ins.
  2008-04-02 20:14                                                                   ` [Ocfs2-devel] [PATCH 34/62] ocfs2: Add kbuild for ocfs2_stack_user.ko Mark Fasheh
@ 2008-04-02 20:14                                                                     ` Mark Fasheh
  2008-04-02 20:14                                                                       ` [Ocfs2-devel] [PATCH 36/62] ocfs2: Document /sys/fs/ocfs2 Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

ocfs2 now supports plug-ins for the classic O2CB stack as well as
userspace cluster stacks in conjunction with fs/dlm.  This allows zero,
one, or both of the plug-ins to be selected in Kconfig.  For local mounts
(non-clustered), neither plug-in is needed.  Both plugins can be loaded
at one time, the runtime will select the one needed for the cluster
systme in use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/Kconfig        |   26 ++++++++++++++++++++++++++
 fs/ocfs2/Makefile |   10 ++++++----
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index d731282..c7b50ce 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -444,6 +444,32 @@ config OCFS2_FS
 	  For more information on OCFS2, see the file
 	  <file:Documentation/filesystems/ocfs2.txt>.
 
+config OCFS2_FS_O2CB
+	tristate "O2CB Kernelspace Clustering"
+	depends on OCFS2_FS
+	default y
+	help
+	  OCFS2 includes a simple kernelspace clustering package, the OCFS2
+	  Cluster Base.  It only requires a very small userspace component
+	  to configure it. This comes with the standard ocfs2-tools package.
+	  O2CB is limited to maintaining a cluster for OCFS2 file systems.
+	  It cannot manage any other cluster applications.
+
+	  It is always safe to say Y here, as the clustering method is
+	  run-time selectable.
+
+config OCFS2_FS_USERSPACE_CLUSTER
+	tristate "OCFS2 Userspace Clustering"
+	depends on OCFS2_FS && DLM
+	default y
+	help
+	  This option will allow OCFS2 to use userspace clustering services
+	  in conjunction with the DLM in fs/dlm.  If you are using a
+	  userspace cluster manager, say Y here.
+
+	  It is safe to say Y, as the clustering method is run-time
+	  selectable.
+
 config OCFS2_DEBUG_MASKLOG
 	bool "OCFS2 logging support"
 	depends on OCFS2_FS
diff --git a/fs/ocfs2/Makefile b/fs/ocfs2/Makefile
index b8d6d02..f6956de 100644
--- a/fs/ocfs2/Makefile
+++ b/fs/ocfs2/Makefile
@@ -4,9 +4,10 @@ EXTRA_CFLAGS += -DCATCH_BH_JBD_RACES
 
 obj-$(CONFIG_OCFS2_FS) += 	\
 	ocfs2.o			\
-	ocfs2_stackglue.o	\
-	ocfs2_stack_o2cb.o	\
-	ocfs2_stack_user.o
+	ocfs2_stackglue.o
+
+obj-$(CONFIG_OCFS2_FS_O2CB) += ocfs2_stack_o2cb.o
+obj-$(CONFIG_OCFS2_FS_USERSPACE_CLUSTER) += ocfs2_stack_user.o
 
 ocfs2-objs := \
 	alloc.o 		\
@@ -39,5 +40,6 @@ ocfs2_stackglue-objs := stackglue.o
 ocfs2_stack_o2cb-objs := stack_o2cb.o
 ocfs2_stack_user-objs := stack_user.o
 
+# cluster/ is always needed when OCFS2_FS for masklog support
 obj-$(CONFIG_OCFS2_FS) += cluster/
-obj-$(CONFIG_OCFS2_FS) += dlm/
+obj-$(CONFIG_OCFS2_FS_O2CB) += dlm/
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 36/62] ocfs2: Document /sys/fs/ocfs2
  2008-04-02 20:14                                                                     ` [Ocfs2-devel] [PATCH 35/62] ocfs2: Allow selection of cluster plug-ins Mark Fasheh
@ 2008-04-02 20:14                                                                       ` Mark Fasheh
  2008-04-02 20:14                                                                         ` [Ocfs2-devel] [PATCH 37/62] ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

From: Joel Becker <joel.becker@oracle.com>

Add ABI documentation for these files:

	/sys/fs/ocfs2/max_locking_protocol
	/sys/fs/ocfs2/loaded_cluster_plugins
	/sys/fs/ocfs2/active_cluster_plugin
	/sys/fs/ocfs2/cluster_stack

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 Documentation/ABI/testing/sysfs-ocfs2 |   89 +++++++++++++++++++++++++++++++++
 1 files changed, 89 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-ocfs2

diff --git a/Documentation/ABI/testing/sysfs-ocfs2 b/Documentation/ABI/testing/sysfs-ocfs2
new file mode 100644
index 0000000..b7cc516
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-ocfs2
@@ -0,0 +1,89 @@
+What:		/sys/fs/ocfs2/
+Date:		April 2008
+Contact:	ocfs2-devel at oss.oracle.com
+Description:
+		The /sys/fs/ocfs2 directory contains knobs used by the
+		ocfs2-tools to interact with the filesystem.
+
+What:		/sys/fs/ocfs2/max_locking_protocol
+Date:		April 2008
+Contact:	ocfs2-devel at oss.oracle.com
+Description:
+		The /sys/fs/ocfs2/max_locking_protocol file displays version
+		of ocfs2 locking supported by the filesystem.  This version
+		covers how ocfs2 uses distributed locking between cluster
+		nodes.
+
+		The protocol version has a major and minor number.  Two
+		cluster nodes can interoperate if they have an identical
+		major number and an overlapping minor number - thus,
+		a node with version 1.10 can interoperate with a node
+		sporting version 1.8, as long as both use the 1.8 protocol.
+
+		Reading from this file returns a single line, the major
+		number and minor number joined by a period, eg "1.10".
+
+		This file is read-only.  The value is compiled into the
+		driver.
+
+What:		/sys/fs/ocfs2/loaded_cluster_plugins
+Date:		April 2008
+Contact:	ocfs2-devel at oss.oracle.com
+Description:
+		The /sys/fs/ocfs2/loaded_cluster_plugins file describes
+		the available plugins to support ocfs2 cluster operation.
+		A cluster plugin is required to use ocfs2 in a cluster.
+		There are currently two available plugins:
+
+		* 'o2cb' - The classic o2cb cluster stack that ocfs2 has
+			used since its inception.
+		* 'user' - A plugin supporting userspace cluster software
+			in conjunction with fs/dlm.
+
+		Reading from this file returns the names of all loaded
+		plugins, one per line.
+
+		This file is read-only.  Its contents may change as
+		plugins are loaded or removed.
+
+What:		/sys/fs/ocfs2/active_cluster_plugin
+Date:		April 2008
+Contact:	ocfs2-devel at oss.oracle.com
+Description:
+		The /sys/fs/ocfs2/active_cluster_plugin displays which
+		cluster plugin is currently in use by the filesystem.
+		The active plugin will appear in the loaded_cluster_plugins
+		file as well.  Only one plugin can be used at a time.
+
+		Reading from this file returns the name of the active plugin
+		on a single line.
+
+		This file is read-only.  Which plugin is active depends on
+		the cluster stack in use.  The contents may change
+		when all filesystems are unmounted and the cluster stack
+		is changed.
+
+What:		/sys/fs/ocfs2/cluster_stack
+Date:		April 2008
+Contact:	ocfs2-devel at oss.oracle.com
+Description:
+		The /sys/fs/ocfs2/cluster_stack file contains the name
+		of current ocfs2 cluster stack.  This value is set by
+		userspace tools when bringing the cluster stack online.
+
+		Cluster stack names are 4 characters in length.
+
+		When the 'o2cb' cluster stack is used, the 'o2cb' cluster
+		plugin is active.  All other cluster stacks use the 'user'
+		cluster plugin.
+
+		Reading from this file returns the name of the current
+		cluster stack on a single line.
+
+		Writing a new stack name to this file changes the current
+		cluster stack unless there are mounted ocfs2 filesystems.
+		If there are mounted filesystems, attempts to change the
+		stack return an error.
+
+Users:
+	ocfs2-tools <ocfs2-tools-devel@oss.oracle.com>
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 37/62] ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle
  2008-04-02 20:14                                                                       ` [Ocfs2-devel] [PATCH 36/62] ocfs2: Document /sys/fs/ocfs2 Mark Fasheh
@ 2008-04-02 20:14                                                                         ` Mark Fasheh
  2008-04-02 20:14                                                                           ` [Ocfs2-devel] [PATCH 38/62] ocfs2/dlm: Create slabcaches for lock and lockres Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch renames dlm_mle_slabcache to prevent namespace clashes with fs/dlm.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmdomain.c |    4 +++-
 fs/ocfs2/dlm/dlmmaster.c |    2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 0879d86..2ce6207 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -1816,8 +1816,10 @@ static int __init dlm_init(void)
 	dlm_print_version();
 
 	status = dlm_init_mle_cache();
-	if (status)
+	if (status) {
+		mlog(ML_ERROR, "Could not create o2dlm_mle slabcache\n");
 		return -1;
+	}
 
 	status = dlm_register_net_handlers();
 	if (status) {
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index ea6b895..90797c5 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -507,7 +507,7 @@ static void dlm_mle_node_up(struct dlm_ctxt *dlm,
 
 int dlm_init_mle_cache(void)
 {
-	dlm_mle_cache = kmem_cache_create("dlm_mle_cache",
+	dlm_mle_cache = kmem_cache_create("o2dlm_mle",
 					  sizeof(struct dlm_master_list_entry),
 					  0, SLAB_HWCACHE_ALIGN,
 					  NULL);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 38/62] ocfs2/dlm: Create slabcaches for lock and lockres
  2008-04-02 20:14                                                                         ` [Ocfs2-devel] [PATCH 37/62] ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle Mark Fasheh
@ 2008-04-02 20:14                                                                           ` Mark Fasheh
  2008-04-02 20:14                                                                             ` [Ocfs2-devel] [PATCH 39/62] ocfs2/dlm: Link all lockres' to a tracking list Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch makes the o2dlm allocate memory for lockres, lockname and lock
structures from slabcaches rather than kmalloc. This allows us to not only
make these allocs more efficient but also allows us to track the memory being
consumed by these structures.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmcommon.h |    7 +++++
 fs/ocfs2/dlm/dlmdomain.c |   26 +++++++++++++++++--
 fs/ocfs2/dlm/dlmlock.c   |   22 +++++++++++++++-
 fs/ocfs2/dlm/dlmmaster.c |   61 +++++++++++++++++++++++++++++++++++++---------
 4 files changed, 99 insertions(+), 17 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index dc8ea66..7525a8a 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -963,9 +963,16 @@ static inline void __dlm_wait_on_lockres(struct dlm_lock_resource *res)
 					  DLM_LOCK_RES_MIGRATING));
 }
 
+/* create/destroy slab caches */
+int dlm_init_master_caches(void);
+void dlm_destroy_master_caches(void);
+
+int dlm_init_lock_cache(void);
+void dlm_destroy_lock_cache(void);
 
 int dlm_init_mle_cache(void);
 void dlm_destroy_mle_cache(void);
+
 void dlm_hb_event_notify_attached(struct dlm_ctxt *dlm, int idx, int node_up);
 int dlm_drop_lockres_ref(struct dlm_ctxt *dlm,
 			 struct dlm_lock_resource *res);
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 2ce6207..b092364 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -1818,21 +1818,41 @@ static int __init dlm_init(void)
 	status = dlm_init_mle_cache();
 	if (status) {
 		mlog(ML_ERROR, "Could not create o2dlm_mle slabcache\n");
-		return -1;
+		goto error;
+	}
+
+	status = dlm_init_master_caches();
+	if (status) {
+		mlog(ML_ERROR, "Could not create o2dlm_lockres and "
+		     "o2dlm_lockname slabcaches\n");
+		goto error;
+	}
+
+	status = dlm_init_lock_cache();
+	if (status) {
+		mlog(ML_ERROR, "Count not create o2dlm_lock slabcache\n");
+		goto error;
 	}
 
 	status = dlm_register_net_handlers();
 	if (status) {
-		dlm_destroy_mle_cache();
-		return -1;
+		mlog(ML_ERROR, "Unable to register network handlers\n");
+		goto error;
 	}
 
 	return 0;
+error:
+	dlm_destroy_lock_cache();
+	dlm_destroy_master_caches();
+	dlm_destroy_mle_cache();
+	return -1;
 }
 
 static void __exit dlm_exit (void)
 {
 	dlm_unregister_net_handlers();
+	dlm_destroy_lock_cache();
+	dlm_destroy_master_caches();
 	dlm_destroy_mle_cache();
 }
 
diff --git a/fs/ocfs2/dlm/dlmlock.c b/fs/ocfs2/dlm/dlmlock.c
index 52578d9..83a9f29 100644
--- a/fs/ocfs2/dlm/dlmlock.c
+++ b/fs/ocfs2/dlm/dlmlock.c
@@ -53,6 +53,8 @@
 #define MLOG_MASK_PREFIX ML_DLM
 #include "cluster/masklog.h"
 
+static struct kmem_cache *dlm_lock_cache = NULL;
+
 static DEFINE_SPINLOCK(dlm_cookie_lock);
 static u64 dlm_next_cookie = 1;
 
@@ -64,6 +66,22 @@ static void dlm_init_lock(struct dlm_lock *newlock, int type,
 static void dlm_lock_release(struct kref *kref);
 static void dlm_lock_detach_lockres(struct dlm_lock *lock);
 
+int dlm_init_lock_cache(void)
+{
+	dlm_lock_cache = kmem_cache_create("o2dlm_lock",
+					   sizeof(struct dlm_lock),
+					   0, SLAB_HWCACHE_ALIGN, NULL);
+	if (dlm_lock_cache == NULL)
+		return -ENOMEM;
+	return 0;
+}
+
+void dlm_destroy_lock_cache(void)
+{
+	if (dlm_lock_cache)
+		kmem_cache_destroy(dlm_lock_cache);
+}
+
 /* Tell us whether we can grant a new lock request.
  * locking:
  *   caller needs:  res->spinlock
@@ -353,7 +371,7 @@ static void dlm_lock_release(struct kref *kref)
 		mlog(0, "freeing kernel-allocated lksb\n");
 		kfree(lock->lksb);
 	}
-	kfree(lock);
+	kmem_cache_free(dlm_lock_cache, lock);
 }
 
 /* associate a lock with it's lockres, getting a ref on the lockres */
@@ -412,7 +430,7 @@ struct dlm_lock * dlm_new_lock(int type, u8 node, u64 cookie,
 	struct dlm_lock *lock;
 	int kernel_allocated = 0;
 
-	lock = kzalloc(sizeof(*lock), GFP_NOFS);
+	lock = (struct dlm_lock *) kmem_cache_zalloc(dlm_lock_cache, GFP_NOFS);
 	if (!lock)
 		return NULL;
 
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 90797c5..ac9ed31 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -216,10 +216,10 @@ EXPORT_SYMBOL_GPL(dlm_dump_all_mles);
 
 #endif  /*  0  */
 
-
+static struct kmem_cache *dlm_lockres_cache = NULL;
+static struct kmem_cache *dlm_lockname_cache = NULL;
 static struct kmem_cache *dlm_mle_cache = NULL;
 
-
 static void dlm_mle_release(struct kref *kref);
 static void dlm_init_mle(struct dlm_master_list_entry *mle,
 			enum dlm_mle_type type,
@@ -560,6 +560,35 @@ static void dlm_mle_release(struct kref *kref)
  * LOCK RESOURCE FUNCTIONS
  */
 
+int dlm_init_master_caches(void)
+{
+	dlm_lockres_cache = kmem_cache_create("o2dlm_lockres",
+					      sizeof(struct dlm_lock_resource),
+					      0, SLAB_HWCACHE_ALIGN, NULL);
+	if (!dlm_lockres_cache)
+		goto bail;
+
+	dlm_lockname_cache = kmem_cache_create("o2dlm_lockname",
+					       DLM_LOCKID_NAME_MAX, 0,
+					       SLAB_HWCACHE_ALIGN, NULL);
+	if (!dlm_lockname_cache)
+		goto bail;
+
+	return 0;
+bail:
+	dlm_destroy_master_caches();
+	return -ENOMEM;
+}
+
+void dlm_destroy_master_caches(void)
+{
+	if (dlm_lockname_cache)
+		kmem_cache_destroy(dlm_lockname_cache);
+
+	if (dlm_lockres_cache)
+		kmem_cache_destroy(dlm_lockres_cache);
+}
+
 static void dlm_set_lockres_owner(struct dlm_ctxt *dlm,
 				  struct dlm_lock_resource *res,
 				  u8 owner)
@@ -642,9 +671,9 @@ static void dlm_lockres_release(struct kref *kref)
 	BUG_ON(!list_empty(&res->recovering));
 	BUG_ON(!list_empty(&res->purge));
 
-	kfree(res->lockname.name);
+	kmem_cache_free(dlm_lockname_cache, (void *)res->lockname.name);
 
-	kfree(res);
+	kmem_cache_free(dlm_lockres_cache, res);
 }
 
 void dlm_lockres_put(struct dlm_lock_resource *res)
@@ -700,20 +729,28 @@ struct dlm_lock_resource *dlm_new_lockres(struct dlm_ctxt *dlm,
 				   const char *name,
 				   unsigned int namelen)
 {
-	struct dlm_lock_resource *res;
+	struct dlm_lock_resource *res = NULL;
 
-	res = kmalloc(sizeof(struct dlm_lock_resource), GFP_NOFS);
+	res = (struct dlm_lock_resource *)
+				kmem_cache_zalloc(dlm_lockres_cache, GFP_NOFS);
 	if (!res)
-		return NULL;
+		goto error;
 
-	res->lockname.name = kmalloc(namelen, GFP_NOFS);
-	if (!res->lockname.name) {
-		kfree(res);
-		return NULL;
-	}
+	res->lockname.name = (char *)
+				kmem_cache_zalloc(dlm_lockname_cache, GFP_NOFS);
+	if (!res->lockname.name)
+		goto error;
 
 	dlm_init_lockres(dlm, res, name, namelen);
 	return res;
+
+error:
+	if (res && res->lockname.name)
+		kmem_cache_free(dlm_lockname_cache, (void *)res->lockname.name);
+
+	if (res)
+		kmem_cache_free(dlm_lockres_cache, res);
+	return NULL;
 }
 
 void __dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 39/62] ocfs2/dlm: Link all lockres' to a tracking list
  2008-04-02 20:14                                                                           ` [Ocfs2-devel] [PATCH 38/62] ocfs2/dlm: Create slabcaches for lock and lockres Mark Fasheh
@ 2008-04-02 20:14                                                                             ` Mark Fasheh
  2008-04-02 20:14                                                                               ` [Ocfs2-devel] [PATCH 40/62] ocfs2/dlm: Create debugfs dirs Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch links all the lockres' to a tracking list in dlm_ctxt.
We will use this in an upcoming patch that will walk the entire
list and to dump the lockres states to a debugfs file.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmcommon.h |    4 ++++
 fs/ocfs2/dlm/dlmdomain.c |   11 +++++++++++
 fs/ocfs2/dlm/dlmmaster.c |   11 +++++++++++
 3 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 7525a8a..cc31abe 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -101,6 +101,7 @@ struct dlm_ctxt
 	struct list_head purge_list;
 	struct list_head pending_asts;
 	struct list_head pending_basts;
+	struct list_head tracking_list;
 	unsigned int purge_count;
 	spinlock_t spinlock;
 	spinlock_t ast_lock;
@@ -270,6 +271,9 @@ struct dlm_lock_resource
 	struct list_head dirty;
 	struct list_head recovering; // dlm_recovery_ctxt.resources list
 
+	/* Added during init and removed during release */
+	struct list_head tracking;	/* dlm->tracking_list */
+
 	/* unused lock resources have their last_used stamped and are
 	 * put on a list for the dlm thread to run. */
 	unsigned long    last_used;
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index b092364..4f7695c 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -644,6 +644,7 @@ int dlm_shutting_down(struct dlm_ctxt *dlm)
 void dlm_unregister_domain(struct dlm_ctxt *dlm)
 {
 	int leave = 0;
+	struct dlm_lock_resource *res;
 
 	spin_lock(&dlm_domain_lock);
 	BUG_ON(dlm->dlm_state != DLM_CTXT_JOINED);
@@ -673,6 +674,15 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
 			msleep(500);
 			mlog(0, "%s: more migration to do\n", dlm->name);
 		}
+
+		/* This list should be empty. If not, print remaining lockres */
+		if (!list_empty(&dlm->tracking_list)) {
+			mlog(ML_ERROR, "Following lockres' are still on the "
+			     "tracking list:\n");
+			list_for_each_entry(res, &dlm->tracking_list, tracking)
+				dlm_print_one_lock_resource(res);
+		}
+
 		dlm_mark_domain_leaving(dlm);
 		dlm_leave_domain(dlm);
 		dlm_complete_dlm_shutdown(dlm);
@@ -1526,6 +1536,7 @@ static struct dlm_ctxt *dlm_alloc_ctxt(const char *domain,
 	INIT_LIST_HEAD(&dlm->reco.node_data);
 	INIT_LIST_HEAD(&dlm->purge_list);
 	INIT_LIST_HEAD(&dlm->dlm_domain_handlers);
+	INIT_LIST_HEAD(&dlm->tracking_list);
 	dlm->reco.state = 0;
 
 	INIT_LIST_HEAD(&dlm->pending_asts);
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index ac9ed31..9713346 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -639,6 +639,14 @@ static void dlm_lockres_release(struct kref *kref)
 	mlog(0, "destroying lockres %.*s\n", res->lockname.len,
 	     res->lockname.name);
 
+	if (!list_empty(&res->tracking))
+		list_del_init(&res->tracking);
+	else {
+		mlog(ML_ERROR, "Resource %.*s not on the Tracking list\n",
+		     res->lockname.len, res->lockname.name);
+		dlm_print_one_lock_resource(res);
+	}
+
 	if (!hlist_unhashed(&res->hash_node) ||
 	    !list_empty(&res->granted) ||
 	    !list_empty(&res->converting) ||
@@ -706,6 +714,7 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm,
 	INIT_LIST_HEAD(&res->dirty);
 	INIT_LIST_HEAD(&res->recovering);
 	INIT_LIST_HEAD(&res->purge);
+	INIT_LIST_HEAD(&res->tracking);
 	atomic_set(&res->asts_reserved, 0);
 	res->migration_pending = 0;
 	res->inflight_locks = 0;
@@ -721,6 +730,8 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm,
 
 	res->last_used = 0;
 
+	list_add_tail(&res->tracking, &dlm->tracking_list);
+
 	memset(res->lvb, 0, DLM_LVB_LEN);
 	memset(res->refmap, 0, sizeof(res->refmap));
 }
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 40/62] ocfs2/dlm: Create debugfs dirs
  2008-04-02 20:14                                                                             ` [Ocfs2-devel] [PATCH 39/62] ocfs2/dlm: Link all lockres' to a tracking list Mark Fasheh
@ 2008-04-02 20:14                                                                               ` Mark Fasheh
  2008-04-02 20:14                                                                                 ` [Ocfs2-devel] [PATCH 41/62] ocfs2/dlm: Dump the dlm state in a debugfs file Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch creates the debugfs directories that will hold the
files to be used to dump the dlm state.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmcommon.h |    2 +
 fs/ocfs2/dlm/dlmdebug.c  |   50 +++++++++++++++++++++++++++++++++++++++++-
 fs/ocfs2/dlm/dlmdebug.h  |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/dlm/dlmdomain.c |   21 +++++++++++++++++-
 4 files changed, 125 insertions(+), 2 deletions(-)
 create mode 100644 fs/ocfs2/dlm/dlmdebug.h

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index cc31abe..6a49140 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -123,6 +123,8 @@ struct dlm_ctxt
 	atomic_t remote_resources;
 	atomic_t unknown_resources;
 
+	struct dentry *dlm_debugfs_subroot;
+
 	/* NOTE: Next three are protected by dlm_domain_lock */
 	struct kref dlm_refs;
 	enum dlm_ctxt_state dlm_state;
diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 64239b3..62e2a4c 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -30,6 +30,7 @@
 #include <linux/utsname.h>
 #include <linux/sysctl.h>
 #include <linux/spinlock.h>
+#include <linux/debugfs.h>
 
 #include "cluster/heartbeat.h"
 #include "cluster/nodemanager.h"
@@ -37,8 +38,8 @@
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
-
 #include "dlmdomain.h"
+#include "dlmdebug.h"
 
 #define MLOG_MASK_PREFIX ML_DLM
 #include "cluster/masklog.h"
@@ -266,3 +267,50 @@ const char *dlm_errname(enum dlm_status err)
 	return dlm_errnames[err];
 }
 EXPORT_SYMBOL_GPL(dlm_errname);
+
+
+#ifdef CONFIG_DEBUG_FS
+
+static struct dentry *dlm_debugfs_root = NULL;
+
+#define DLM_DEBUGFS_DIR				"o2dlm"
+
+/* subroot - domain dir */
+int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm)
+{
+	dlm->dlm_debugfs_subroot = debugfs_create_dir(dlm->name,
+						      dlm_debugfs_root);
+	if (!dlm->dlm_debugfs_subroot) {
+		mlog_errno(-ENOMEM);
+		goto bail;
+	}
+
+	return 0;
+bail:
+	dlm_destroy_debugfs_subroot(dlm);
+	return -ENOMEM;
+}
+
+void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm)
+{
+	if (dlm->dlm_debugfs_subroot)
+		debugfs_remove(dlm->dlm_debugfs_subroot);
+}
+
+/* debugfs root */
+int dlm_create_debugfs_root(void)
+{
+	dlm_debugfs_root = debugfs_create_dir(DLM_DEBUGFS_DIR, NULL);
+	if (!dlm_debugfs_root) {
+		mlog_errno(-ENOMEM);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+void dlm_destroy_debugfs_root(void)
+{
+	if (dlm_debugfs_root)
+		debugfs_remove(dlm_debugfs_root);
+}
+#endif	/* CONFIG_DEBUG_FS */
diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h
new file mode 100644
index 0000000..b969595
--- /dev/null
+++ b/fs/ocfs2/dlm/dlmdebug.h
@@ -0,0 +1,54 @@
+/* -*- mode: c; c-basic-offset: 8; -*-
+ * vim: noexpandtab sw=8 ts=8 sts=0:
+ *
+ * dlmdebug.h
+ *
+ * Copyright (C) 2008 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ *
+ */
+
+#ifndef DLMDEBUG_H
+#define DLMDEBUG_H
+
+#ifdef CONFIG_DEBUG_FS
+
+int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm);
+void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm);
+
+int dlm_create_debugfs_root(void);
+void dlm_destroy_debugfs_root(void);
+
+#else
+
+static int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm)
+{
+	return 0;
+}
+static void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm)
+{
+}
+static int dlm_create_debugfs_root(void)
+{
+	return 0;
+}
+static void dlm_destroy_debugfs_root(void)
+{
+}
+
+#endif	/* CONFIG_DEBUG_FS */
+#endif	/* DLMDEBUG_H */
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 4f7695c..c137d69 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -33,6 +33,7 @@
 #include <linux/spinlock.h>
 #include <linux/delay.h>
 #include <linux/err.h>
+#include <linux/debugfs.h>
 
 #include "cluster/heartbeat.h"
 #include "cluster/nodemanager.h"
@@ -40,8 +41,8 @@
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
-
 #include "dlmdomain.h"
+#include "dlmdebug.h"
 
 #include "dlmver.h"
 
@@ -298,6 +299,8 @@ static int dlm_wait_on_domain_helper(const char *domain)
 
 static void dlm_free_ctxt_mem(struct dlm_ctxt *dlm)
 {
+	dlm_destroy_debugfs_subroot(dlm);
+
 	if (dlm->lockres_hash)
 		dlm_free_pagevec((void **)dlm->lockres_hash, DLM_HASH_PAGES);
 
@@ -1494,6 +1497,7 @@ static struct dlm_ctxt *dlm_alloc_ctxt(const char *domain,
 				u32 key)
 {
 	int i;
+	int ret;
 	struct dlm_ctxt *dlm = NULL;
 
 	dlm = kzalloc(sizeof(*dlm), GFP_KERNEL);
@@ -1526,6 +1530,15 @@ static struct dlm_ctxt *dlm_alloc_ctxt(const char *domain,
 	dlm->key = key;
 	dlm->node_num = o2nm_this_node();
 
+	ret = dlm_create_debugfs_subroot(dlm);
+	if (ret < 0) {
+		dlm_free_pagevec((void **)dlm->lockres_hash, DLM_HASH_PAGES);
+		kfree(dlm->name);
+		kfree(dlm);
+		dlm = NULL;
+		goto leave;
+	}
+
 	spin_lock_init(&dlm->spinlock);
 	spin_lock_init(&dlm->master_lock);
 	spin_lock_init(&dlm->ast_lock);
@@ -1851,8 +1864,13 @@ static int __init dlm_init(void)
 		goto error;
 	}
 
+	status = dlm_create_debugfs_root();
+	if (status)
+		goto error;
+
 	return 0;
 error:
+	dlm_unregister_net_handlers();
 	dlm_destroy_lock_cache();
 	dlm_destroy_master_caches();
 	dlm_destroy_mle_cache();
@@ -1861,6 +1879,7 @@ error:
 
 static void __exit dlm_exit (void)
 {
+	dlm_destroy_debugfs_root();
 	dlm_unregister_net_handlers();
 	dlm_destroy_lock_cache();
 	dlm_destroy_master_caches();
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 41/62] ocfs2/dlm: Dump the dlm state in a debugfs file
  2008-04-02 20:14                                                                               ` [Ocfs2-devel] [PATCH 40/62] ocfs2/dlm: Create debugfs dirs Mark Fasheh
@ 2008-04-02 20:14                                                                                 ` Mark Fasheh
  2008-04-02 20:14                                                                                   ` [Ocfs2-devel] [PATCH 42/62] ocfs2/dlm: Dumps the lockres' into " Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch dumps the dlm state (dlm_ctxt) into a debugfs file.
Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmcommon.h |    1 +
 fs/ocfs2/dlm/dlmdebug.c  |  296 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/dlm/dlmdebug.h  |   20 +++
 fs/ocfs2/dlm/dlmdomain.c |    8 ++
 4 files changed, 325 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 6a49140..f7a51ca 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -123,6 +123,7 @@ struct dlm_ctxt
 	atomic_t remote_resources;
 	atomic_t unknown_resources;
 
+	struct dlm_debug_ctxt *dlm_debug_ctxt;
 	struct dentry *dlm_debugfs_subroot;
 
 	/* NOTE: Next three are protected by dlm_domain_lock */
diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 62e2a4c..e335403 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -274,6 +274,294 @@ EXPORT_SYMBOL_GPL(dlm_errname);
 static struct dentry *dlm_debugfs_root = NULL;
 
 #define DLM_DEBUGFS_DIR				"o2dlm"
+#define DLM_DEBUGFS_DLM_STATE			"dlm_state"
+
+/* begin - utils funcs */
+static void dlm_debug_free(struct kref *kref)
+{
+	struct dlm_debug_ctxt *dc;
+
+	dc = container_of(kref, struct dlm_debug_ctxt, debug_refcnt);
+
+	kfree(dc);
+}
+
+void dlm_debug_put(struct dlm_debug_ctxt *dc)
+{
+	if (dc)
+		kref_put(&dc->debug_refcnt, dlm_debug_free);
+}
+
+static void dlm_debug_get(struct dlm_debug_ctxt *dc)
+{
+	kref_get(&dc->debug_refcnt);
+}
+
+static int stringify_nodemap(unsigned long *nodemap, int maxnodes,
+			     char *buf, int len)
+{
+	int out = 0;
+	int i = -1;
+
+	while ((i = find_next_bit(nodemap, maxnodes, i + 1)) < maxnodes)
+		out += snprintf(buf + out, len - out, "%d ", i);
+
+	return out;
+}
+
+static struct debug_buffer *debug_buffer_allocate(void)
+{
+	struct debug_buffer *db = NULL;
+
+	db = kzalloc(sizeof(struct debug_buffer), GFP_KERNEL);
+	if (!db)
+		goto bail;
+
+	db->len = PAGE_SIZE;
+	db->buf = kmalloc(db->len, GFP_KERNEL);
+	if (!db->buf)
+		goto bail;
+
+	return db;
+bail:
+	kfree(db);
+	return NULL;
+}
+
+static ssize_t debug_buffer_read(struct file *file, char __user *buf,
+				 size_t nbytes, loff_t *ppos)
+{
+	struct debug_buffer *db = file->private_data;
+
+	return simple_read_from_buffer(buf, nbytes, ppos, db->buf, db->len);
+}
+
+static loff_t debug_buffer_llseek(struct file *file, loff_t off, int whence)
+{
+	struct debug_buffer *db = file->private_data;
+	loff_t new = -1;
+
+	switch (whence) {
+	case 0:
+		new = off;
+		break;
+	case 1:
+		new = file->f_pos + off;
+		break;
+	}
+
+	if (new < 0 || new > db->len)
+		return -EINVAL;
+
+	return (file->f_pos = new);
+}
+
+static int debug_buffer_release(struct inode *inode, struct file *file)
+{
+	struct debug_buffer *db = (struct debug_buffer *)file->private_data;
+
+	if (db)
+		kfree(db->buf);
+	kfree(db);
+
+	return 0;
+}
+/* end - util funcs */
+
+/* begin - debug state funcs */
+static int debug_state_print(struct dlm_ctxt *dlm, struct debug_buffer *db)
+{
+	int out = 0;
+	struct dlm_reco_node_data *node;
+	char *state;
+	int lres, rres, ures, tres;
+
+	lres = atomic_read(&dlm->local_resources);
+	rres = atomic_read(&dlm->remote_resources);
+	ures = atomic_read(&dlm->unknown_resources);
+	tres = lres + rres + ures;
+
+	spin_lock(&dlm->spinlock);
+
+	switch (dlm->dlm_state) {
+	case DLM_CTXT_NEW:
+		state = "NEW"; break;
+	case DLM_CTXT_JOINED:
+		state = "JOINED"; break;
+	case DLM_CTXT_IN_SHUTDOWN:
+		state = "SHUTDOWN"; break;
+	case DLM_CTXT_LEAVING:
+		state = "LEAVING"; break;
+	default:
+		state = "UNKNOWN"; break;
+	}
+
+	/* Domain: xxxxxxxxxx  Key: 0xdfbac769 */
+	out += snprintf(db->buf + out, db->len - out,
+			"Domain: %s  Key: 0x%08x\n", dlm->name, dlm->key);
+
+	/* Thread Pid: xxx  Node: xxx  State: xxxxx */
+	out += snprintf(db->buf + out, db->len - out,
+			"Thread Pid: %d  Node: %d  State: %s\n",
+			dlm->dlm_thread_task->pid, dlm->node_num, state);
+
+	/* Number of Joins: xxx  Joining Node: xxx */
+	out += snprintf(db->buf + out, db->len - out,
+			"Number of Joins: %d  Joining Node: %d\n",
+			dlm->num_joins, dlm->joining_node);
+
+	/* Domain Map: xx xx xx */
+	out += snprintf(db->buf + out, db->len - out, "Domain Map: ");
+	out += stringify_nodemap(dlm->domain_map, O2NM_MAX_NODES,
+				 db->buf + out, db->len - out);
+	out += snprintf(db->buf + out, db->len - out, "\n");
+
+	/* Live Map: xx xx xx */
+	out += snprintf(db->buf + out, db->len - out, "Live Map: ");
+	out += stringify_nodemap(dlm->live_nodes_map, O2NM_MAX_NODES,
+				 db->buf + out, db->len - out);
+	out += snprintf(db->buf + out, db->len - out, "\n");
+
+	/* Mastered Resources Total: xxx  Locally: xxx  Remotely: ... */
+	out += snprintf(db->buf + out, db->len - out,
+			"Mastered Resources Total: %d  Locally: %d  "
+			"Remotely: %d  Unknown: %d\n",
+			tres, lres, rres, ures);
+
+	/* Lists: Dirty=Empty  Purge=InUse  PendingASTs=Empty  ... */
+	out += snprintf(db->buf + out, db->len - out,
+			"Lists: Dirty=%s  Purge=%s  PendingASTs=%s  "
+			"PendingBASTs=%s  Master=%s\n",
+			(list_empty(&dlm->dirty_list) ? "Empty" : "InUse"),
+			(list_empty(&dlm->purge_list) ? "Empty" : "InUse"),
+			(list_empty(&dlm->pending_asts) ? "Empty" : "InUse"),
+			(list_empty(&dlm->pending_basts) ? "Empty" : "InUse"),
+			(list_empty(&dlm->master_list) ? "Empty" : "InUse"));
+
+	/* Purge Count: xxx  Refs: xxx */
+	out += snprintf(db->buf + out, db->len - out,
+			"Purge Count: %d  Refs: %d\n", dlm->purge_count,
+			atomic_read(&dlm->dlm_refs.refcount));
+
+	/* Dead Node: xxx */
+	out += snprintf(db->buf + out, db->len - out,
+			"Dead Node: %d\n", dlm->reco.dead_node);
+
+	/* What about DLM_RECO_STATE_FINALIZE? */
+	if (dlm->reco.state == DLM_RECO_STATE_ACTIVE)
+		state = "ACTIVE";
+	else
+		state = "INACTIVE";
+
+	/* Recovery Pid: xxxx  Master: xxx  State: xxxx */
+	out += snprintf(db->buf + out, db->len - out,
+			"Recovery Pid: %d  Master: %d  State: %s\n",
+			dlm->dlm_reco_thread_task->pid,
+			dlm->reco.new_master, state);
+
+	/* Recovery Map: xx xx */
+	out += snprintf(db->buf + out, db->len - out, "Recovery Map: ");
+	out += stringify_nodemap(dlm->recovery_map, O2NM_MAX_NODES,
+				 db->buf + out, db->len - out);
+	out += snprintf(db->buf + out, db->len - out, "\n");
+
+	/* Recovery Node State: */
+	out += snprintf(db->buf + out, db->len - out, "Recovery Node State:\n");
+	list_for_each_entry(node, &dlm->reco.node_data, list) {
+		switch (node->state) {
+		case DLM_RECO_NODE_DATA_INIT:
+			state = "INIT";
+			break;
+		case DLM_RECO_NODE_DATA_REQUESTING:
+			state = "REQUESTING";
+			break;
+		case DLM_RECO_NODE_DATA_DEAD:
+			state = "DEAD";
+			break;
+		case DLM_RECO_NODE_DATA_RECEIVING:
+			state = "RECEIVING";
+			break;
+		case DLM_RECO_NODE_DATA_REQUESTED:
+			state = "REQUESTED";
+			break;
+		case DLM_RECO_NODE_DATA_DONE:
+			state = "DONE";
+			break;
+		case DLM_RECO_NODE_DATA_FINALIZE_SENT:
+			state = "FINALIZE-SENT";
+			break;
+		default:
+			state = "BAD";
+			break;
+		}
+		out += snprintf(db->buf + out, db->len - out, "\t%u - %s\n",
+				node->node_num, state);
+	}
+
+	spin_unlock(&dlm->spinlock);
+
+	return out;
+}
+
+static int debug_state_open(struct inode *inode, struct file *file)
+{
+	struct dlm_ctxt *dlm = inode->i_private;
+	struct debug_buffer *db = NULL;
+
+	db = debug_buffer_allocate();
+	if (!db)
+		goto bail;
+
+	db->len = debug_state_print(dlm, db);
+
+	file->private_data = db;
+
+	return 0;
+bail:
+	return -ENOMEM;
+}
+
+static struct file_operations debug_state_fops = {
+	.open =		debug_state_open,
+	.release =	debug_buffer_release,
+	.read =		debug_buffer_read,
+	.llseek =	debug_buffer_llseek,
+};
+/* end  - debug state funcs */
+
+/* files in subroot */
+int dlm_debug_init(struct dlm_ctxt *dlm)
+{
+	struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt;
+
+	/* for dumping dlm_ctxt */
+	dc->debug_state_dentry = debugfs_create_file(DLM_DEBUGFS_DLM_STATE,
+						     S_IFREG|S_IRUSR,
+						     dlm->dlm_debugfs_subroot,
+						     dlm, &debug_state_fops);
+	if (!dc->debug_state_dentry) {
+		mlog_errno(-ENOMEM);
+		goto bail;
+	}
+
+	dlm_debug_get(dc);
+	return 0;
+
+bail:
+	dlm_debug_shutdown(dlm);
+	return -ENOMEM;
+}
+
+void dlm_debug_shutdown(struct dlm_ctxt *dlm)
+{
+	struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt;
+
+	if (dc) {
+		if (dc->debug_state_dentry)
+			debugfs_remove(dc->debug_state_dentry);
+		dlm_debug_put(dc);
+	}
+}
 
 /* subroot - domain dir */
 int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm)
@@ -285,6 +573,14 @@ int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm)
 		goto bail;
 	}
 
+	dlm->dlm_debug_ctxt = kzalloc(sizeof(struct dlm_debug_ctxt),
+				      GFP_KERNEL);
+	if (!dlm->dlm_debug_ctxt) {
+		mlog_errno(-ENOMEM);
+		goto bail;
+	}
+	kref_init(&dlm->dlm_debug_ctxt->debug_refcnt);
+
 	return 0;
 bail:
 	dlm_destroy_debugfs_subroot(dlm);
diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h
index b969595..94cc10a 100644
--- a/fs/ocfs2/dlm/dlmdebug.h
+++ b/fs/ocfs2/dlm/dlmdebug.h
@@ -27,6 +27,19 @@
 
 #ifdef CONFIG_DEBUG_FS
 
+struct dlm_debug_ctxt {
+	struct kref debug_refcnt;
+	struct dentry *debug_state_dentry;
+};
+
+struct debug_buffer {
+	int len;
+	char *buf;
+};
+
+int dlm_debug_init(struct dlm_ctxt *dlm);
+void dlm_debug_shutdown(struct dlm_ctxt *dlm);
+
 int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm);
 void dlm_destroy_debugfs_subroot(struct dlm_ctxt *dlm);
 
@@ -35,6 +48,13 @@ void dlm_destroy_debugfs_root(void);
 
 #else
 
+static int dlm_debug_init(struct dlm_ctxt *dlm)
+{
+	return 0;
+}
+static void dlm_debug_shutdown(struct dlm_ctxt *dlm)
+{
+}
 static int dlm_create_debugfs_subroot(struct dlm_ctxt *dlm)
 {
 	return 0;
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index c137d69..63f8125 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -398,6 +398,7 @@ static void dlm_destroy_dlm_worker(struct dlm_ctxt *dlm)
 static void dlm_complete_dlm_shutdown(struct dlm_ctxt *dlm)
 {
 	dlm_unregister_domain_handlers(dlm);
+	dlm_debug_shutdown(dlm);
 	dlm_complete_thread(dlm);
 	dlm_complete_recovery_thread(dlm);
 	dlm_destroy_dlm_worker(dlm);
@@ -1418,6 +1419,12 @@ static int dlm_join_domain(struct dlm_ctxt *dlm)
 		goto bail;
 	}
 
+	status = dlm_debug_init(dlm);
+	if (status < 0) {
+		mlog_errno(status);
+		goto bail;
+	}
+
 	status = dlm_launch_thread(dlm);
 	if (status < 0) {
 		mlog_errno(status);
@@ -1485,6 +1492,7 @@ bail:
 
 	if (status) {
 		dlm_unregister_domain_handlers(dlm);
+		dlm_debug_shutdown(dlm);
 		dlm_complete_thread(dlm);
 		dlm_complete_recovery_thread(dlm);
 		dlm_destroy_dlm_worker(dlm);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 42/62] ocfs2/dlm: Dumps the lockres' into a debugfs file
  2008-04-02 20:14                                                                                 ` [Ocfs2-devel] [PATCH 41/62] ocfs2/dlm: Dump the dlm state in a debugfs file Mark Fasheh
@ 2008-04-02 20:14                                                                                   ` Mark Fasheh
  2008-04-02 20:14                                                                                     ` [Ocfs2-devel] [PATCH 43/62] ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch dumps all the lockres' alongwith all the locks into
a debugfs file. Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmdebug.c |  247 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/dlm/dlmdebug.h |    8 ++
 2 files changed, 255 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index e335403..cccb1ce 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -273,8 +273,35 @@ EXPORT_SYMBOL_GPL(dlm_errname);
 
 static struct dentry *dlm_debugfs_root = NULL;
 
+/* NOTE: This function converts a lockname into a string. It uses knowledge
+ * of the format of the lockname that should be outside the purview of the dlm.
+ * We are adding only to make dlm debugging slightly easier.
+ *
+ * For more on lockname formats, please refer to dlmglue.c and ocfs2_lockid.h.
+ */
+static int stringify_lockname(const char *lockname, int locklen,
+			      char *buf, int len)
+{
+	int out = 0;
+	__be64 inode_blkno_be;
+
+#define OCFS2_DENTRY_LOCK_INO_START	18
+	if (*lockname == 'N') {
+		memcpy((__be64 *)&inode_blkno_be,
+		       (char *)&lockname[OCFS2_DENTRY_LOCK_INO_START],
+		       sizeof(__be64));
+		out += snprintf(buf + out, len - out, "%.*s%08x",
+				OCFS2_DENTRY_LOCK_INO_START - 1, lockname,
+				(unsigned int)be64_to_cpu(inode_blkno_be));
+	} else
+		out += snprintf(buf + out, len - out, "%.*s",
+				locklen, lockname);
+	return out;
+}
+
 #define DLM_DEBUGFS_DIR				"o2dlm"
 #define DLM_DEBUGFS_DLM_STATE			"dlm_state"
+#define DLM_DEBUGFS_LOCKING_STATE		"locking_state"
 
 /* begin - utils funcs */
 static void dlm_debug_free(struct kref *kref)
@@ -368,6 +395,213 @@ static int debug_buffer_release(struct inode *inode, struct file *file)
 }
 /* end - util funcs */
 
+/* begin - debug lockres funcs */
+static int dump_lock(struct dlm_lock *lock, int list_type, char *buf, int len)
+{
+	int out;
+
+#define DEBUG_LOCK_VERSION	1
+	spin_lock(&lock->spinlock);
+	out = snprintf(buf, len, "LOCK:%d,%d,%d,%d,%d,%d:%lld,%d,%d,%d,%d,%d,"
+		       "%d,%d,%d,%d\n",
+		       DEBUG_LOCK_VERSION,
+		       list_type, lock->ml.type, lock->ml.convert_type,
+		       lock->ml.node,
+		       dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)),
+		       dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)),
+		       !list_empty(&lock->ast_list),
+		       !list_empty(&lock->bast_list),
+		       lock->ast_pending, lock->bast_pending,
+		       lock->convert_pending, lock->lock_pending,
+		       lock->cancel_pending, lock->unlock_pending,
+		       atomic_read(&lock->lock_refs.refcount));
+	spin_unlock(&lock->spinlock);
+
+	return out;
+}
+
+static int dump_lockres(struct dlm_lock_resource *res, char *buf, int len)
+{
+	struct dlm_lock *lock;
+	int i;
+	int out = 0;
+
+	out += snprintf(buf + out, len - out, "NAME:");
+	out += stringify_lockname(res->lockname.name, res->lockname.len,
+				  buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+#define DEBUG_LRES_VERSION	1
+	out += snprintf(buf + out, len - out,
+			"LRES:%d,%d,%d,%ld,%d,%d,%d,%d,%d,%d,%d\n",
+			DEBUG_LRES_VERSION,
+			res->owner, res->state, res->last_used,
+			!list_empty(&res->purge),
+			!list_empty(&res->dirty),
+			!list_empty(&res->recovering),
+			res->inflight_locks, res->migration_pending,
+			atomic_read(&res->asts_reserved),
+			atomic_read(&res->refs.refcount));
+
+	/* refmap */
+	out += snprintf(buf + out, len - out, "RMAP:");
+	out += stringify_nodemap(res->refmap, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	/* lvb */
+	out += snprintf(buf + out, len - out, "LVBX:");
+	for (i = 0; i < DLM_LVB_LEN; i++)
+		out += snprintf(buf + out, len - out,
+					"%02x", (unsigned char)res->lvb[i]);
+	out += snprintf(buf + out, len - out, "\n");
+
+	/* granted */
+	list_for_each_entry(lock, &res->granted, list)
+		out += dump_lock(lock, 0, buf + out, len - out);
+
+	/* converting */
+	list_for_each_entry(lock, &res->converting, list)
+		out += dump_lock(lock, 1, buf + out, len - out);
+
+	/* blocked */
+	list_for_each_entry(lock, &res->blocked, list)
+		out += dump_lock(lock, 2, buf + out, len - out);
+
+	out += snprintf(buf + out, len - out, "\n");
+
+	return out;
+}
+
+static void *lockres_seq_start(struct seq_file *m, loff_t *pos)
+{
+	struct debug_lockres *dl = m->private;
+	struct dlm_ctxt *dlm = dl->dl_ctxt;
+	struct dlm_lock_resource *res = NULL;
+
+	spin_lock(&dlm->spinlock);
+
+	if (dl->dl_res) {
+		list_for_each_entry(res, &dl->dl_res->tracking, tracking) {
+			if (dl->dl_res) {
+				dlm_lockres_put(dl->dl_res);
+				dl->dl_res = NULL;
+			}
+			if (&res->tracking == &dlm->tracking_list) {
+				mlog(0, "End of list found, %p\n", res);
+				dl = NULL;
+				break;
+			}
+			dlm_lockres_get(res);
+			dl->dl_res = res;
+			break;
+		}
+	} else {
+		if (!list_empty(&dlm->tracking_list)) {
+			list_for_each_entry(res, &dlm->tracking_list, tracking)
+				break;
+			dlm_lockres_get(res);
+			dl->dl_res = res;
+		} else
+			dl = NULL;
+	}
+
+	if (dl) {
+		spin_lock(&dl->dl_res->spinlock);
+		dump_lockres(dl->dl_res, dl->dl_buf, dl->dl_len - 1);
+		spin_unlock(&dl->dl_res->spinlock);
+	}
+
+	spin_unlock(&dlm->spinlock);
+
+	return dl;
+}
+
+static void lockres_seq_stop(struct seq_file *m, void *v)
+{
+}
+
+static void *lockres_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	return NULL;
+}
+
+static int lockres_seq_show(struct seq_file *s, void *v)
+{
+	struct debug_lockres *dl = (struct debug_lockres *)v;
+
+	seq_printf(s, "%s", dl->dl_buf);
+
+	return 0;
+}
+
+static struct seq_operations debug_lockres_ops = {
+	.start =	lockres_seq_start,
+	.stop =		lockres_seq_stop,
+	.next =		lockres_seq_next,
+	.show =		lockres_seq_show,
+};
+
+static int debug_lockres_open(struct inode *inode, struct file *file)
+{
+	struct dlm_ctxt *dlm = inode->i_private;
+	int ret = -ENOMEM;
+	struct seq_file *seq;
+	struct debug_lockres *dl = NULL;
+
+	dl = kzalloc(sizeof(struct debug_lockres), GFP_KERNEL);
+	if (!dl) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	dl->dl_len = PAGE_SIZE;
+	dl->dl_buf = kmalloc(dl->dl_len, GFP_KERNEL);
+	if (!dl->dl_buf) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	ret = seq_open(file, &debug_lockres_ops);
+	if (ret) {
+		mlog_errno(ret);
+		goto bail;
+	}
+
+	seq = (struct seq_file *) file->private_data;
+	seq->private = dl;
+
+	dlm_grab(dlm);
+	dl->dl_ctxt = dlm;
+
+	return 0;
+bail:
+	if (dl)
+		kfree(dl->dl_buf);
+	kfree(dl);
+	return ret;
+}
+
+static int debug_lockres_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = (struct seq_file *)file->private_data;
+	struct debug_lockres *dl = (struct debug_lockres *)seq->private;
+
+	if (dl->dl_res)
+		dlm_lockres_put(dl->dl_res);
+	dlm_put(dl->dl_ctxt);
+	kfree(dl->dl_buf);
+	return seq_release_private(inode, file);
+}
+
+static struct file_operations debug_lockres_fops = {
+	.open =		debug_lockres_open,
+	.release =	debug_lockres_release,
+	.read =		seq_read,
+	.llseek =	seq_lseek,
+};
+/* end - debug lockres funcs */
+
 /* begin - debug state funcs */
 static int debug_state_print(struct dlm_ctxt *dlm, struct debug_buffer *db)
 {
@@ -544,6 +778,17 @@ int dlm_debug_init(struct dlm_ctxt *dlm)
 		goto bail;
 	}
 
+	/* for dumping lockres */
+	dc->debug_lockres_dentry =
+			debugfs_create_file(DLM_DEBUGFS_LOCKING_STATE,
+					    S_IFREG|S_IRUSR,
+					    dlm->dlm_debugfs_subroot,
+					    dlm, &debug_lockres_fops);
+	if (!dc->debug_lockres_dentry) {
+		mlog_errno(-ENOMEM);
+		goto bail;
+	}
+
 	dlm_debug_get(dc);
 	return 0;
 
@@ -557,6 +802,8 @@ void dlm_debug_shutdown(struct dlm_ctxt *dlm)
 	struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt;
 
 	if (dc) {
+		if (dc->debug_lockres_dentry)
+			debugfs_remove(dc->debug_lockres_dentry);
 		if (dc->debug_state_dentry)
 			debugfs_remove(dc->debug_state_dentry);
 		dlm_debug_put(dc);
diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h
index 94cc10a..7c5b2b0 100644
--- a/fs/ocfs2/dlm/dlmdebug.h
+++ b/fs/ocfs2/dlm/dlmdebug.h
@@ -30,6 +30,7 @@
 struct dlm_debug_ctxt {
 	struct kref debug_refcnt;
 	struct dentry *debug_state_dentry;
+	struct dentry *debug_lockres_dentry;
 };
 
 struct debug_buffer {
@@ -37,6 +38,13 @@ struct debug_buffer {
 	char *buf;
 };
 
+struct debug_lockres {
+	int dl_len;
+	char *dl_buf;
+	struct dlm_ctxt *dl_ctxt;
+	struct dlm_lock_resource *dl_res;
+};
+
 int dlm_debug_init(struct dlm_ctxt *dlm);
 void dlm_debug_shutdown(struct dlm_ctxt *dlm);
 
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 43/62] ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h
  2008-04-02 20:14                                                                                   ` [Ocfs2-devel] [PATCH 42/62] ocfs2/dlm: Dumps the lockres' into " Mark Fasheh
@ 2008-04-02 20:14                                                                                     ` Mark Fasheh
  2008-04-02 20:14                                                                                       ` [Ocfs2-devel] [PATCH 44/62] ocfs2/dlm: Dumps the mles into a debugfs file Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch moves some mle related definitions from dlmmaster.c
to dlmcommon.h. Future patches need these definitions to dump mle
debugging information.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.beckeroracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmcommon.h |   35 +++++++++++++++++++++++++++++++++++
 fs/ocfs2/dlm/dlmmaster.c |   37 -------------------------------------
 2 files changed, 35 insertions(+), 37 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index f7a51ca..d5a86fb 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -49,6 +49,41 @@
 /* Intended to make it easier for us to switch out hash functions */
 #define dlm_lockid_hash(_n, _l) full_name_hash(_n, _l)
 
+enum dlm_mle_type {
+	DLM_MLE_BLOCK,
+	DLM_MLE_MASTER,
+	DLM_MLE_MIGRATION
+};
+
+struct dlm_lock_name {
+	u8 len;
+	u8 name[DLM_LOCKID_NAME_MAX];
+};
+
+struct dlm_master_list_entry {
+	struct list_head list;
+	struct list_head hb_events;
+	struct dlm_ctxt *dlm;
+	spinlock_t spinlock;
+	wait_queue_head_t wq;
+	atomic_t woken;
+	struct kref mle_refs;
+	int inuse;
+	unsigned long maybe_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
+	unsigned long vote_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
+	unsigned long response_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
+	unsigned long node_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
+	u8 master;
+	u8 new_master;
+	enum dlm_mle_type type;
+	struct o2hb_callback_func mle_hb_up;
+	struct o2hb_callback_func mle_hb_down;
+	union {
+		struct dlm_lock_resource *res;
+		struct dlm_lock_name name;
+	} u;
+};
+
 enum dlm_ast_type {
 	DLM_AST = 0,
 	DLM_BAST,
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 9713346..94cadcb 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -52,43 +52,6 @@
 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_MASTER)
 #include "cluster/masklog.h"
 
-enum dlm_mle_type {
-	DLM_MLE_BLOCK,
-	DLM_MLE_MASTER,
-	DLM_MLE_MIGRATION
-};
-
-struct dlm_lock_name
-{
-	u8 len;
-	u8 name[DLM_LOCKID_NAME_MAX];
-};
-
-struct dlm_master_list_entry
-{
-	struct list_head list;
-	struct list_head hb_events;
-	struct dlm_ctxt *dlm;
-	spinlock_t spinlock;
-	wait_queue_head_t wq;
-	atomic_t woken;
-	struct kref mle_refs;
-	int inuse;
-	unsigned long maybe_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
-	unsigned long vote_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
-	unsigned long response_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
-	unsigned long node_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
-	u8 master;
-	u8 new_master;
-	enum dlm_mle_type type;
-	struct o2hb_callback_func mle_hb_up;
-	struct o2hb_callback_func mle_hb_down;
-	union {
-		struct dlm_lock_resource *res;
-		struct dlm_lock_name name;
-	} u;
-};
-
 static void dlm_mle_node_down(struct dlm_ctxt *dlm,
 			      struct dlm_master_list_entry *mle,
 			      struct o2nm_node *node,
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 44/62] ocfs2/dlm: Dumps the mles into a debugfs file
  2008-04-02 20:14                                                                                     ` [Ocfs2-devel] [PATCH 43/62] ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h Mark Fasheh
@ 2008-04-02 20:14                                                                                       ` Mark Fasheh
  2008-04-02 20:14                                                                                         ` [Ocfs2-devel] [PATCH 45/62] ocfs2/dlm: Dumps the purgelist " Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch dumps all mles it can fit in one page into a debugfs file.
Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmdebug.c |  119 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/dlm/dlmdebug.h |    1 +
 2 files changed, 120 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index cccb1ce..6de326b 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -302,6 +302,7 @@ static int stringify_lockname(const char *lockname, int locklen,
 #define DLM_DEBUGFS_DIR				"o2dlm"
 #define DLM_DEBUGFS_DLM_STATE			"dlm_state"
 #define DLM_DEBUGFS_LOCKING_STATE		"locking_state"
+#define DLM_DEBUGFS_MLE_STATE			"mle_state"
 
 /* begin - utils funcs */
 static void dlm_debug_free(struct kref *kref)
@@ -395,6 +396,112 @@ static int debug_buffer_release(struct inode *inode, struct file *file)
 }
 /* end - util funcs */
 
+/* begin - debug mle funcs */
+static int dump_mle(struct dlm_master_list_entry *mle, char *buf, int len)
+{
+	int out = 0;
+	unsigned int namelen;
+	const char *name;
+	char *mle_type;
+
+	if (mle->type != DLM_MLE_MASTER) {
+		namelen = mle->u.name.len;
+		name = mle->u.name.name;
+	} else {
+		namelen = mle->u.res->lockname.len;
+		name = mle->u.res->lockname.name;
+	}
+
+	if (mle->type == DLM_MLE_BLOCK)
+		mle_type = "BLK";
+	else if (mle->type == DLM_MLE_MASTER)
+		mle_type = "MAS";
+	else
+		mle_type = "MIG";
+
+	out += stringify_lockname(name, namelen, buf + out, len - out);
+	out += snprintf(buf + out, len - out,
+			"\t%3s\tmas=%3u\tnew=%3u\tevt=%1d\tuse=%1d\tref=%3d\n",
+			mle_type, mle->master, mle->new_master,
+			!list_empty(&mle->hb_events),
+			!!mle->inuse,
+			atomic_read(&mle->mle_refs.refcount));
+
+	out += snprintf(buf + out, len - out, "Maybe=");
+	out += stringify_nodemap(mle->maybe_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "Vote=");
+	out += stringify_nodemap(mle->vote_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "Response=");
+	out += stringify_nodemap(mle->response_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "Node=");
+	out += stringify_nodemap(mle->node_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "\n");
+
+	return out;
+}
+
+static int debug_mle_print(struct dlm_ctxt *dlm, struct debug_buffer *db)
+{
+	struct dlm_master_list_entry *mle;
+	int out = 0;
+	unsigned long total = 0;
+
+	out += snprintf(db->buf + out, db->len - out,
+			"Dumping MLEs for Domain: %s\n", dlm->name);
+
+	spin_lock(&dlm->master_lock);
+	list_for_each_entry(mle, &dlm->master_list, list) {
+		++total;
+		if (db->len - out < 200)
+			continue;
+		out += dump_mle(mle, db->buf + out, db->len - out);
+	}
+	spin_unlock(&dlm->master_lock);
+
+	out += snprintf(db->buf + out, db->len - out,
+			"Total on list: %ld\n", total);
+	return out;
+}
+
+static int debug_mle_open(struct inode *inode, struct file *file)
+{
+	struct dlm_ctxt *dlm = inode->i_private;
+	struct debug_buffer *db;
+
+	db = debug_buffer_allocate();
+	if (!db)
+		goto bail;
+
+	db->len = debug_mle_print(dlm, db);
+
+	file->private_data = db;
+
+	return 0;
+bail:
+	return -ENOMEM;
+}
+
+static struct file_operations debug_mle_fops = {
+	.open =		debug_mle_open,
+	.release =	debug_buffer_release,
+	.read =		debug_buffer_read,
+	.llseek =	debug_buffer_llseek,
+};
+
+/* end - debug mle funcs */
+
 /* begin - debug lockres funcs */
 static int dump_lock(struct dlm_lock *lock, int list_type, char *buf, int len)
 {
@@ -789,6 +896,16 @@ int dlm_debug_init(struct dlm_ctxt *dlm)
 		goto bail;
 	}
 
+	/* for dumping mles */
+	dc->debug_mle_dentry = debugfs_create_file(DLM_DEBUGFS_MLE_STATE,
+						   S_IFREG|S_IRUSR,
+						   dlm->dlm_debugfs_subroot,
+						   dlm, &debug_mle_fops);
+	if (!dc->debug_mle_dentry) {
+		mlog_errno(-ENOMEM);
+		goto bail;
+	}
+
 	dlm_debug_get(dc);
 	return 0;
 
@@ -802,6 +919,8 @@ void dlm_debug_shutdown(struct dlm_ctxt *dlm)
 	struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt;
 
 	if (dc) {
+		if (dc->debug_mle_dentry)
+			debugfs_remove(dc->debug_mle_dentry);
 		if (dc->debug_lockres_dentry)
 			debugfs_remove(dc->debug_lockres_dentry);
 		if (dc->debug_state_dentry)
diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h
index 7c5b2b0..cbc69f2 100644
--- a/fs/ocfs2/dlm/dlmdebug.h
+++ b/fs/ocfs2/dlm/dlmdebug.h
@@ -31,6 +31,7 @@ struct dlm_debug_ctxt {
 	struct kref debug_refcnt;
 	struct dentry *debug_state_dentry;
 	struct dentry *debug_lockres_dentry;
+	struct dentry *debug_mle_dentry;
 };
 
 struct debug_buffer {
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 45/62] ocfs2/dlm: Dumps the purgelist into a debugfs file
  2008-04-02 20:14                                                                                       ` [Ocfs2-devel] [PATCH 44/62] ocfs2/dlm: Dumps the mles into a debugfs file Mark Fasheh
@ 2008-04-02 20:14                                                                                         ` Mark Fasheh
  2008-04-02 20:14                                                                                           ` [Ocfs2-devel] [PATCH 46/62] ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch dumps all the lockres' on the purgelist it can fit in one page
into a debugfs file. Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmdebug.c |   71 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/dlm/dlmdebug.h |    1 +
 2 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 6de326b..a109005 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -303,6 +303,7 @@ static int stringify_lockname(const char *lockname, int locklen,
 #define DLM_DEBUGFS_DLM_STATE			"dlm_state"
 #define DLM_DEBUGFS_LOCKING_STATE		"locking_state"
 #define DLM_DEBUGFS_MLE_STATE			"mle_state"
+#define DLM_DEBUGFS_PURGE_LIST			"purge_list"
 
 /* begin - utils funcs */
 static void dlm_debug_free(struct kref *kref)
@@ -396,6 +397,63 @@ static int debug_buffer_release(struct inode *inode, struct file *file)
 }
 /* end - util funcs */
 
+/* begin - purge list funcs */
+static int debug_purgelist_print(struct dlm_ctxt *dlm, struct debug_buffer *db)
+{
+	struct dlm_lock_resource *res;
+	int out = 0;
+	unsigned long total = 0;
+
+	out += snprintf(db->buf + out, db->len - out,
+			"Dumping Purgelist for Domain: %s\n", dlm->name);
+
+	spin_lock(&dlm->spinlock);
+	list_for_each_entry(res, &dlm->purge_list, purge) {
+		++total;
+		if (db->len - out < 100)
+			continue;
+		spin_lock(&res->spinlock);
+		out += stringify_lockname(res->lockname.name,
+					  res->lockname.len,
+					  db->buf + out, db->len - out);
+		out += snprintf(db->buf + out, db->len - out, "\t%ld\n",
+				(jiffies - res->last_used)/HZ);
+		spin_unlock(&res->spinlock);
+	}
+	spin_unlock(&dlm->spinlock);
+
+	out += snprintf(db->buf + out, db->len - out,
+			"Total on list: %ld\n", total);
+
+	return out;
+}
+
+static int debug_purgelist_open(struct inode *inode, struct file *file)
+{
+	struct dlm_ctxt *dlm = inode->i_private;
+	struct debug_buffer *db;
+
+	db = debug_buffer_allocate();
+	if (!db)
+		goto bail;
+
+	db->len = debug_purgelist_print(dlm, db);
+
+	file->private_data = db;
+
+	return 0;
+bail:
+	return -ENOMEM;
+}
+
+static struct file_operations debug_purgelist_fops = {
+	.open =		debug_purgelist_open,
+	.release =	debug_buffer_release,
+	.read =		debug_buffer_read,
+	.llseek =	debug_buffer_llseek,
+};
+/* end - purge list funcs */
+
 /* begin - debug mle funcs */
 static int dump_mle(struct dlm_master_list_entry *mle, char *buf, int len)
 {
@@ -906,6 +964,17 @@ int dlm_debug_init(struct dlm_ctxt *dlm)
 		goto bail;
 	}
 
+	/* for dumping lockres on the purge list */
+	dc->debug_purgelist_dentry =
+			debugfs_create_file(DLM_DEBUGFS_PURGE_LIST,
+					    S_IFREG|S_IRUSR,
+					    dlm->dlm_debugfs_subroot,
+					    dlm, &debug_purgelist_fops);
+	if (!dc->debug_purgelist_dentry) {
+		mlog_errno(-ENOMEM);
+		goto bail;
+	}
+
 	dlm_debug_get(dc);
 	return 0;
 
@@ -919,6 +988,8 @@ void dlm_debug_shutdown(struct dlm_ctxt *dlm)
 	struct dlm_debug_ctxt *dc = dlm->dlm_debug_ctxt;
 
 	if (dc) {
+		if (dc->debug_purgelist_dentry)
+			debugfs_remove(dc->debug_purgelist_dentry);
 		if (dc->debug_mle_dentry)
 			debugfs_remove(dc->debug_mle_dentry);
 		if (dc->debug_lockres_dentry)
diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h
index cbc69f2..8857743 100644
--- a/fs/ocfs2/dlm/dlmdebug.h
+++ b/fs/ocfs2/dlm/dlmdebug.h
@@ -32,6 +32,7 @@ struct dlm_debug_ctxt {
 	struct dentry *debug_state_dentry;
 	struct dentry *debug_lockres_dentry;
 	struct dentry *debug_mle_dentry;
+	struct dentry *debug_purgelist_dentry;
 };
 
 struct debug_buffer {
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 46/62] ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
  2008-04-02 20:14                                                                                         ` [Ocfs2-devel] [PATCH 45/62] ocfs2/dlm: Dumps the purgelist " Mark Fasheh
@ 2008-04-02 20:14                                                                                           ` Mark Fasheh
  2008-04-02 20:14                                                                                             ` [Ocfs2-devel] [PATCH 47/62] ocfs2/dlm: Fix lockname in lockres print function Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

This patch helps in consolidating debugging related functions in dlmdebug.c.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmdebug.c  |  154 ++++++++++++++++++++++++---------------------
 fs/ocfs2/dlm/dlmdebug.h  |    2 +
 fs/ocfs2/dlm/dlmmaster.c |   89 +--------------------------
 3 files changed, 85 insertions(+), 160 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index a109005..58e4579 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -268,11 +268,6 @@ const char *dlm_errname(enum dlm_status err)
 }
 EXPORT_SYMBOL_GPL(dlm_errname);
 
-
-#ifdef CONFIG_DEBUG_FS
-
-static struct dentry *dlm_debugfs_root = NULL;
-
 /* NOTE: This function converts a lockname into a string. It uses knowledge
  * of the format of the lockname that should be outside the purview of the dlm.
  * We are adding only to make dlm debugging slightly easier.
@@ -299,6 +294,88 @@ static int stringify_lockname(const char *lockname, int locklen,
 	return out;
 }
 
+static int stringify_nodemap(unsigned long *nodemap, int maxnodes,
+			     char *buf, int len)
+{
+	int out = 0;
+	int i = -1;
+
+	while ((i = find_next_bit(nodemap, maxnodes, i + 1)) < maxnodes)
+		out += snprintf(buf + out, len - out, "%d ", i);
+
+	return out;
+}
+
+static int dump_mle(struct dlm_master_list_entry *mle, char *buf, int len)
+{
+	int out = 0;
+	unsigned int namelen;
+	const char *name;
+	char *mle_type;
+
+	if (mle->type != DLM_MLE_MASTER) {
+		namelen = mle->u.name.len;
+		name = mle->u.name.name;
+	} else {
+		namelen = mle->u.res->lockname.len;
+		name = mle->u.res->lockname.name;
+	}
+
+	if (mle->type == DLM_MLE_BLOCK)
+		mle_type = "BLK";
+	else if (mle->type == DLM_MLE_MASTER)
+		mle_type = "MAS";
+	else
+		mle_type = "MIG";
+
+	out += stringify_lockname(name, namelen, buf + out, len - out);
+	out += snprintf(buf + out, len - out,
+			"\t%3s\tmas=%3u\tnew=%3u\tevt=%1d\tuse=%1d\tref=%3d\n",
+			mle_type, mle->master, mle->new_master,
+			!list_empty(&mle->hb_events),
+			!!mle->inuse,
+			atomic_read(&mle->mle_refs.refcount));
+
+	out += snprintf(buf + out, len - out, "Maybe=");
+	out += stringify_nodemap(mle->maybe_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "Vote=");
+	out += stringify_nodemap(mle->vote_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "Response=");
+	out += stringify_nodemap(mle->response_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "Node=");
+	out += stringify_nodemap(mle->node_map, O2NM_MAX_NODES,
+				 buf + out, len - out);
+	out += snprintf(buf + out, len - out, "\n");
+
+	out += snprintf(buf + out, len - out, "\n");
+
+	return out;
+}
+
+void dlm_print_one_mle(struct dlm_master_list_entry *mle)
+{
+	char *buf;
+
+	buf = (char *) get_zeroed_page(GFP_NOFS);
+	if (buf) {
+		dump_mle(mle, buf, PAGE_SIZE - 1);
+		free_page((unsigned long)buf);
+	}
+}
+
+#ifdef CONFIG_DEBUG_FS
+
+static struct dentry *dlm_debugfs_root = NULL;
+
 #define DLM_DEBUGFS_DIR				"o2dlm"
 #define DLM_DEBUGFS_DLM_STATE			"dlm_state"
 #define DLM_DEBUGFS_LOCKING_STATE		"locking_state"
@@ -326,18 +403,6 @@ static void dlm_debug_get(struct dlm_debug_ctxt *dc)
 	kref_get(&dc->debug_refcnt);
 }
 
-static int stringify_nodemap(unsigned long *nodemap, int maxnodes,
-			     char *buf, int len)
-{
-	int out = 0;
-	int i = -1;
-
-	while ((i = find_next_bit(nodemap, maxnodes, i + 1)) < maxnodes)
-		out += snprintf(buf + out, len - out, "%d ", i);
-
-	return out;
-}
-
 static struct debug_buffer *debug_buffer_allocate(void)
 {
 	struct debug_buffer *db = NULL;
@@ -455,61 +520,6 @@ static struct file_operations debug_purgelist_fops = {
 /* end - purge list funcs */
 
 /* begin - debug mle funcs */
-static int dump_mle(struct dlm_master_list_entry *mle, char *buf, int len)
-{
-	int out = 0;
-	unsigned int namelen;
-	const char *name;
-	char *mle_type;
-
-	if (mle->type != DLM_MLE_MASTER) {
-		namelen = mle->u.name.len;
-		name = mle->u.name.name;
-	} else {
-		namelen = mle->u.res->lockname.len;
-		name = mle->u.res->lockname.name;
-	}
-
-	if (mle->type == DLM_MLE_BLOCK)
-		mle_type = "BLK";
-	else if (mle->type == DLM_MLE_MASTER)
-		mle_type = "MAS";
-	else
-		mle_type = "MIG";
-
-	out += stringify_lockname(name, namelen, buf + out, len - out);
-	out += snprintf(buf + out, len - out,
-			"\t%3s\tmas=%3u\tnew=%3u\tevt=%1d\tuse=%1d\tref=%3d\n",
-			mle_type, mle->master, mle->new_master,
-			!list_empty(&mle->hb_events),
-			!!mle->inuse,
-			atomic_read(&mle->mle_refs.refcount));
-
-	out += snprintf(buf + out, len - out, "Maybe=");
-	out += stringify_nodemap(mle->maybe_map, O2NM_MAX_NODES,
-				 buf + out, len - out);
-	out += snprintf(buf + out, len - out, "\n");
-
-	out += snprintf(buf + out, len - out, "Vote=");
-	out += stringify_nodemap(mle->vote_map, O2NM_MAX_NODES,
-				 buf + out, len - out);
-	out += snprintf(buf + out, len - out, "\n");
-
-	out += snprintf(buf + out, len - out, "Response=");
-	out += stringify_nodemap(mle->response_map, O2NM_MAX_NODES,
-				 buf + out, len - out);
-	out += snprintf(buf + out, len - out, "\n");
-
-	out += snprintf(buf + out, len - out, "Node=");
-	out += stringify_nodemap(mle->node_map, O2NM_MAX_NODES,
-				 buf + out, len - out);
-	out += snprintf(buf + out, len - out, "\n");
-
-	out += snprintf(buf + out, len - out, "\n");
-
-	return out;
-}
-
 static int debug_mle_print(struct dlm_ctxt *dlm, struct debug_buffer *db)
 {
 	struct dlm_master_list_entry *mle;
diff --git a/fs/ocfs2/dlm/dlmdebug.h b/fs/ocfs2/dlm/dlmdebug.h
index 8857743..d34a62a 100644
--- a/fs/ocfs2/dlm/dlmdebug.h
+++ b/fs/ocfs2/dlm/dlmdebug.h
@@ -25,6 +25,8 @@
 #ifndef DLMDEBUG_H
 #define DLMDEBUG_H
 
+void dlm_print_one_mle(struct dlm_master_list_entry *mle);
+
 #ifdef CONFIG_DEBUG_FS
 
 struct dlm_debug_ctxt {
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 94cadcb..efc015c 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -48,6 +48,7 @@
 #include "dlmapi.h"
 #include "dlmcommon.h"
 #include "dlmdomain.h"
+#include "dlmdebug.h"
 
 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_MASTER)
 #include "cluster/masklog.h"
@@ -91,94 +92,6 @@ static inline int dlm_mle_equal(struct dlm_ctxt *dlm,
 	return 1;
 }
 
-#define dlm_print_nodemap(m)  _dlm_print_nodemap(m,#m)
-static void _dlm_print_nodemap(unsigned long *map, const char *mapname)
-{
-	int i;
-	printk("%s=[ ", mapname);
-	for (i=0; i<O2NM_MAX_NODES; i++)
-		if (test_bit(i, map))
-			printk("%d ", i);
-	printk("]");
-}
-
-static void dlm_print_one_mle(struct dlm_master_list_entry *mle)
-{
-	int refs;
-	char *type;
-	char attached;
-	u8 master;
-	unsigned int namelen;
-	const char *name;
-	struct kref *k;
-	unsigned long *maybe = mle->maybe_map,
-		      *vote = mle->vote_map,
-		      *resp = mle->response_map,
-		      *node = mle->node_map;
-
-	k = &mle->mle_refs;
-	if (mle->type == DLM_MLE_BLOCK)
-		type = "BLK";
-	else if (mle->type == DLM_MLE_MASTER)
-		type = "MAS";
-	else
-		type = "MIG";
-	refs = atomic_read(&k->refcount);
-	master = mle->master;
-	attached = (list_empty(&mle->hb_events) ? 'N' : 'Y');
-
-	if (mle->type != DLM_MLE_MASTER) {
-		namelen = mle->u.name.len;
-		name = mle->u.name.name;
-	} else {
-		namelen = mle->u.res->lockname.len;
-		name = mle->u.res->lockname.name;
-	}
-
-	mlog(ML_NOTICE, "%.*s: %3s refs=%3d mas=%3u new=%3u evt=%c inuse=%d ",
-		  namelen, name, type, refs, master, mle->new_master, attached,
-		  mle->inuse);
-	dlm_print_nodemap(maybe);
-	printk(", ");
-	dlm_print_nodemap(vote);
-	printk(", ");
-	dlm_print_nodemap(resp);
-	printk(", ");
-	dlm_print_nodemap(node);
-	printk(", ");
-	printk("\n");
-}
-
-#if 0
-/* Code here is included but defined out as it aids debugging */
-
-static void dlm_dump_mles(struct dlm_ctxt *dlm)
-{
-	struct dlm_master_list_entry *mle;
-	
-	mlog(ML_NOTICE, "dumping all mles for domain %s:\n", dlm->name);
-	spin_lock(&dlm->master_lock);
-	list_for_each_entry(mle, &dlm->master_list, list)
-		dlm_print_one_mle(mle);
-	spin_unlock(&dlm->master_lock);
-}
-
-int dlm_dump_all_mles(const char __user *data, unsigned int len)
-{
-	struct dlm_ctxt *dlm;
-
-	spin_lock(&dlm_domain_lock);
-	list_for_each_entry(dlm, &dlm_domains, list) {
-		mlog(ML_NOTICE, "found dlm: %p, name=%s\n", dlm, dlm->name);
-		dlm_dump_mles(dlm);
-	}
-	spin_unlock(&dlm_domain_lock);
-	return len;
-}
-EXPORT_SYMBOL_GPL(dlm_dump_all_mles);
-
-#endif  /*  0  */
-
 static struct kmem_cache *dlm_lockres_cache = NULL;
 static struct kmem_cache *dlm_lockname_cache = NULL;
 static struct kmem_cache *dlm_mle_cache = NULL;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 47/62] ocfs2/dlm: Fix lockname in lockres print function
  2008-04-02 20:14                                                                                           ` [Ocfs2-devel] [PATCH 46/62] ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c Mark Fasheh
@ 2008-04-02 20:14                                                                                             ` Mark Fasheh
  2008-04-02 20:14                                                                                               ` [Ocfs2-devel] [PATCH 48/62] ocfs2/dlm: Cleanup lockres print Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

__dlm_print_one_lock_resource was printing lockname incorrectly.
Also, we now use printk directly instead of mlog as the latter prints
the line context which is not useful for this print.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmdebug.c |  126 +++++++++++++++++++----------------------------
 1 files changed, 51 insertions(+), 75 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 58e4579..53a9e60 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -5,7 +5,7 @@
  *
  * debug functionality for the dlm
  *
- * Copyright (C) 2004 Oracle.  All rights reserved.
+ * Copyright (C) 2004, 2008 Oracle.  All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public
@@ -44,11 +44,10 @@
 #define MLOG_MASK_PREFIX ML_DLM
 #include "cluster/masklog.h"
 
+int stringify_lockname(const char *lockname, int locklen, char *buf, int len);
+
 void dlm_print_one_lock_resource(struct dlm_lock_resource *res)
 {
-	mlog(ML_NOTICE, "lockres: %.*s, owner=%u, state=%u\n",
-	       res->lockname.len, res->lockname.name,
-	       res->owner, res->state);
 	spin_lock(&res->spinlock);
 	__dlm_print_one_lock_resource(res);
 	spin_unlock(&res->spinlock);
@@ -59,75 +58,78 @@ static void dlm_print_lockres_refmap(struct dlm_lock_resource *res)
 	int bit;
 	assert_spin_locked(&res->spinlock);
 
-	mlog(ML_NOTICE, "  refmap nodes: [ ");
+	printk(KERN_NOTICE "  refmap nodes: [ ");
 	bit = 0;
 	while (1) {
 		bit = find_next_bit(res->refmap, O2NM_MAX_NODES, bit);
 		if (bit >= O2NM_MAX_NODES)
 			break;
-		printk("%u ", bit);
+		printk(KERN_NOTICE "%u ", bit);
 		bit++;
 	}
-	printk("], inflight=%u\n", res->inflight_locks);
+	printk(KERN_NOTICE "], inflight=%u\n", res->inflight_locks);
+}
+
+static void __dlm_print_lock(struct dlm_lock *lock)
+{
+	spin_lock(&lock->spinlock);
+
+	printk(KERN_NOTICE "    type=%d, conv=%d, node=%u, cookie=%u:%llu, "
+	       "ref=%u, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c), "
+	       "pending=(conv=%c,lock=%c,cancel=%c,unlock=%c)\n",
+	       lock->ml.type, lock->ml.convert_type, lock->ml.node,
+	       dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)),
+	       dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)),
+	       atomic_read(&lock->lock_refs.refcount),
+	       (list_empty(&lock->ast_list) ? 'y' : 'n'),
+	       (lock->ast_pending ? 'y' : 'n'),
+	       (list_empty(&lock->bast_list) ? 'y' : 'n'),
+	       (lock->bast_pending ? 'y' : 'n'),
+	       (lock->convert_pending ? 'y' : 'n'),
+	       (lock->lock_pending ? 'y' : 'n'),
+	       (lock->cancel_pending ? 'y' : 'n'),
+	       (lock->unlock_pending ? 'y' : 'n'));
+
+	spin_unlock(&lock->spinlock);
 }
 
 void __dlm_print_one_lock_resource(struct dlm_lock_resource *res)
 {
 	struct list_head *iter2;
 	struct dlm_lock *lock;
+	char buf[DLM_LOCKID_NAME_MAX];
 
 	assert_spin_locked(&res->spinlock);
 
-	mlog(ML_NOTICE, "lockres: %.*s, owner=%u, state=%u\n",
-	       res->lockname.len, res->lockname.name,
-	       res->owner, res->state);
-	mlog(ML_NOTICE, "  last used: %lu, on purge list: %s\n",
-	     res->last_used, list_empty(&res->purge) ? "no" : "yes");
+	stringify_lockname(res->lockname.name, res->lockname.len,
+			   buf, sizeof(buf) - 1);
+	printk(KERN_NOTICE "lockres: %s, owner=%u, state=%u\n",
+	       buf, res->owner, res->state);
+	printk(KERN_NOTICE "  last used: %lu, refcnt: %u, on purge list: %s\n",
+	       res->last_used, atomic_read(&res->refs.refcount),
+	       list_empty(&res->purge) ? "no" : "yes");
+	printk(KERN_NOTICE "  on dirty list: %s, on reco list: %s, "
+	       "migrating pending: %s\n",
+	       list_empty(&res->dirty) ? "no" : "yes",
+	       list_empty(&res->recovering) ? "no" : "yes",
+	       res->migration_pending ? "yes" : "no");
+	printk(KERN_NOTICE "  inflight locks: %d, asts reserved: %d\n",
+	       res->inflight_locks, atomic_read(&res->asts_reserved));
 	dlm_print_lockres_refmap(res);
-	mlog(ML_NOTICE, "  granted queue: \n");
+	printk(KERN_NOTICE "  granted queue:\n");
 	list_for_each(iter2, &res->granted) {
 		lock = list_entry(iter2, struct dlm_lock, list);
-		spin_lock(&lock->spinlock);
-		mlog(ML_NOTICE, "    type=%d, conv=%d, node=%u, "
-		       "cookie=%u:%llu, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c)\n", 
-		       lock->ml.type, lock->ml.convert_type, lock->ml.node, 
-		     dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)),
-		     dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)),
-		       list_empty(&lock->ast_list) ? 'y' : 'n',
-		       lock->ast_pending ? 'y' : 'n',
-		       list_empty(&lock->bast_list) ? 'y' : 'n',
-		       lock->bast_pending ? 'y' : 'n');
-		spin_unlock(&lock->spinlock);
+		__dlm_print_lock(lock);
 	}
-	mlog(ML_NOTICE, "  converting queue: \n");
+	printk(KERN_NOTICE "  converting queue:\n");
 	list_for_each(iter2, &res->converting) {
 		lock = list_entry(iter2, struct dlm_lock, list);
-		spin_lock(&lock->spinlock);
-		mlog(ML_NOTICE, "    type=%d, conv=%d, node=%u, "
-		       "cookie=%u:%llu, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c)\n", 
-		       lock->ml.type, lock->ml.convert_type, lock->ml.node, 
-		     dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)),
-		     dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)),
-		       list_empty(&lock->ast_list) ? 'y' : 'n',
-		       lock->ast_pending ? 'y' : 'n',
-		       list_empty(&lock->bast_list) ? 'y' : 'n',
-		       lock->bast_pending ? 'y' : 'n');
-		spin_unlock(&lock->spinlock);
+		__dlm_print_lock(lock);
 	}
-	mlog(ML_NOTICE, "  blocked queue: \n");
+	printk(KERN_NOTICE "  blocked queue:\n");
 	list_for_each(iter2, &res->blocked) {
 		lock = list_entry(iter2, struct dlm_lock, list);
-		spin_lock(&lock->spinlock);
-		mlog(ML_NOTICE, "    type=%d, conv=%d, node=%u, "
-		       "cookie=%u:%llu, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c)\n", 
-		       lock->ml.type, lock->ml.convert_type, lock->ml.node, 
-		     dlm_get_lock_cookie_node(be64_to_cpu(lock->ml.cookie)),
-		     dlm_get_lock_cookie_seq(be64_to_cpu(lock->ml.cookie)),
-		       list_empty(&lock->ast_list) ? 'y' : 'n',
-		       lock->ast_pending ? 'y' : 'n',
-		       list_empty(&lock->bast_list) ? 'y' : 'n',
-		       lock->bast_pending ? 'y' : 'n');
-		spin_unlock(&lock->spinlock);
+		__dlm_print_lock(lock);
 	}
 }
 
@@ -137,31 +139,6 @@ void dlm_print_one_lock(struct dlm_lock *lockid)
 }
 EXPORT_SYMBOL_GPL(dlm_print_one_lock);
 
-#if 0
-void dlm_dump_lock_resources(struct dlm_ctxt *dlm)
-{
-	struct dlm_lock_resource *res;
-	struct hlist_node *iter;
-	struct hlist_head *bucket;
-	int i;
-
-	mlog(ML_NOTICE, "struct dlm_ctxt: %s, node=%u, key=%u\n",
-		  dlm->name, dlm->node_num, dlm->key);
-	if (!dlm || !dlm->name) {
-		mlog(ML_ERROR, "dlm=%p\n", dlm);
-		return;
-	}
-
-	spin_lock(&dlm->spinlock);
-	for (i=0; i<DLM_HASH_BUCKETS; i++) {
-		bucket = dlm_lockres_hash(dlm, i);
-		hlist_for_each_entry(res, iter, bucket, hash_node)
-			dlm_print_one_lock_resource(res);
-	}
-	spin_unlock(&dlm->spinlock);
-}
-#endif  /*  0  */
-
 static const char *dlm_errnames[] = {
 	[DLM_NORMAL] =			"DLM_NORMAL",
 	[DLM_GRANTED] =			"DLM_GRANTED",
@@ -274,8 +251,7 @@ EXPORT_SYMBOL_GPL(dlm_errname);
  *
  * For more on lockname formats, please refer to dlmglue.c and ocfs2_lockid.h.
  */
-static int stringify_lockname(const char *lockname, int locklen,
-			      char *buf, int len)
+int stringify_lockname(const char *lockname, int locklen, char *buf, int len)
 {
 	int out = 0;
 	__be64 inode_blkno_be;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 48/62] ocfs2/dlm: Cleanup lockres print
  2008-04-02 20:14                                                                                             ` [Ocfs2-devel] [PATCH 47/62] ocfs2/dlm: Fix lockname in lockres print function Mark Fasheh
@ 2008-04-02 20:14                                                                                               ` Mark Fasheh
  2008-04-02 20:14                                                                                                 ` [Ocfs2-devel] [PATCH 49/62] ocfs2: Reconnect after idle time out Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Sunil Mushran

From: Sunil Mushran <sunil.mushran@oracle.com>

A previous patch added KERN_NOTICE to printks printing the lockres that
cluttered the output. This patch removes the log level. For people concerned
with syslog clutter, please note we now use this facility to print lockres
only during an error.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/dlm/dlmdebug.c |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 53a9e60..5f6d858 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -58,23 +58,23 @@ static void dlm_print_lockres_refmap(struct dlm_lock_resource *res)
 	int bit;
 	assert_spin_locked(&res->spinlock);
 
-	printk(KERN_NOTICE "  refmap nodes: [ ");
+	printk("  refmap nodes: [ ");
 	bit = 0;
 	while (1) {
 		bit = find_next_bit(res->refmap, O2NM_MAX_NODES, bit);
 		if (bit >= O2NM_MAX_NODES)
 			break;
-		printk(KERN_NOTICE "%u ", bit);
+		printk("%u ", bit);
 		bit++;
 	}
-	printk(KERN_NOTICE "], inflight=%u\n", res->inflight_locks);
+	printk("], inflight=%u\n", res->inflight_locks);
 }
 
 static void __dlm_print_lock(struct dlm_lock *lock)
 {
 	spin_lock(&lock->spinlock);
 
-	printk(KERN_NOTICE "    type=%d, conv=%d, node=%u, cookie=%u:%llu, "
+	printk("    type=%d, conv=%d, node=%u, cookie=%u:%llu, "
 	       "ref=%u, ast=(empty=%c,pend=%c), bast=(empty=%c,pend=%c), "
 	       "pending=(conv=%c,lock=%c,cancel=%c,unlock=%c)\n",
 	       lock->ml.type, lock->ml.convert_type, lock->ml.node,
@@ -103,30 +103,30 @@ void __dlm_print_one_lock_resource(struct dlm_lock_resource *res)
 
 	stringify_lockname(res->lockname.name, res->lockname.len,
 			   buf, sizeof(buf) - 1);
-	printk(KERN_NOTICE "lockres: %s, owner=%u, state=%u\n",
+	printk("lockres: %s, owner=%u, state=%u\n",
 	       buf, res->owner, res->state);
-	printk(KERN_NOTICE "  last used: %lu, refcnt: %u, on purge list: %s\n",
+	printk("  last used: %lu, refcnt: %u, on purge list: %s\n",
 	       res->last_used, atomic_read(&res->refs.refcount),
 	       list_empty(&res->purge) ? "no" : "yes");
-	printk(KERN_NOTICE "  on dirty list: %s, on reco list: %s, "
+	printk("  on dirty list: %s, on reco list: %s, "
 	       "migrating pending: %s\n",
 	       list_empty(&res->dirty) ? "no" : "yes",
 	       list_empty(&res->recovering) ? "no" : "yes",
 	       res->migration_pending ? "yes" : "no");
-	printk(KERN_NOTICE "  inflight locks: %d, asts reserved: %d\n",
+	printk("  inflight locks: %d, asts reserved: %d\n",
 	       res->inflight_locks, atomic_read(&res->asts_reserved));
 	dlm_print_lockres_refmap(res);
-	printk(KERN_NOTICE "  granted queue:\n");
+	printk("  granted queue:\n");
 	list_for_each(iter2, &res->granted) {
 		lock = list_entry(iter2, struct dlm_lock, list);
 		__dlm_print_lock(lock);
 	}
-	printk(KERN_NOTICE "  converting queue:\n");
+	printk("  converting queue:\n");
 	list_for_each(iter2, &res->converting) {
 		lock = list_entry(iter2, struct dlm_lock, list);
 		__dlm_print_lock(lock);
 	}
-	printk(KERN_NOTICE "  blocked queue:\n");
+	printk("  blocked queue:\n");
 	list_for_each(iter2, &res->blocked) {
 		lock = list_entry(iter2, struct dlm_lock, list);
 		__dlm_print_lock(lock);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 49/62] ocfs2: Reconnect after idle time out.
  2008-04-02 20:14                                                                                               ` [Ocfs2-devel] [PATCH 48/62] ocfs2/dlm: Cleanup lockres print Mark Fasheh
@ 2008-04-02 20:14                                                                                                 ` Mark Fasheh
  2008-04-02 20:15                                                                                                   ` [Ocfs2-devel] [PATCH 50/62] sysfs: Allow removal of symlinks in the sysfs root Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Tao Ma

From: Tao Ma <tao.ma@oracle.com>

Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.

It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get overloaded
enough that the network connection breaks but the disk hb does not.
And if we get into that situation, we either fence (unnecessarily)
or wait for its disk hb to die (and sometimes hang in the process).

So in this updated scheme, when the network disconnects, we keep
attempting to reconnect till we succeed or we get a disk hb down
event.

If the other node is really dead, then we will eventually get a
node down event. If not, we should be able to connect again and
continue.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/cluster/tcp.c          |   51 +++++++++++++++++++++++++++-----------
 fs/ocfs2/cluster/tcp_internal.h |    2 +
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index b8057c5..4ea4b0a 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -399,8 +399,6 @@ static void o2net_set_nn_state(struct o2net_node *nn,
 	mlog_bug_on_msg(err && valid, "err %d valid %u\n", err, valid);
 	mlog_bug_on_msg(valid && !sc, "valid %u sc %p\n", valid, sc);
 
-	/* we won't reconnect after our valid conn goes away for
-	 * this hb iteration.. here so it shows up in the logs */
 	if (was_valid && !valid && err == 0)
 		err = -ENOTCONN;
 
@@ -430,11 +428,6 @@ static void o2net_set_nn_state(struct o2net_node *nn,
 
 	if (!was_valid && valid) {
 		o2quo_conn_up(o2net_num_from_nn(nn));
-		/* this is a bit of a hack.  we only try reconnecting
-		 * when heartbeating starts until we get a connection.
-		 * if that connection then dies we don't try reconnecting.
-		 * the only way to start connecting again is to down
-		 * heartbeat and bring it back up. */
 		cancel_delayed_work(&nn->nn_connect_expired);
 		printk(KERN_INFO "o2net: %s " SC_NODEF_FMT "\n",
 		       o2nm_this_node() > sc->sc_node->nd_num ?
@@ -457,6 +450,18 @@ static void o2net_set_nn_state(struct o2net_node *nn,
 			delay = 0;
 		mlog(ML_CONN, "queueing conn attempt in %lu jiffies\n", delay);
 		queue_delayed_work(o2net_wq, &nn->nn_connect_work, delay);
+
+		/*
+		 * Delay the expired work after idle timeout.
+		 *
+		 * We might have lots of failed connection attempts that run
+		 * through here but we only cancel the connect_expired work when
+		 * a connection attempt succeeds.  So only the first enqueue of
+		 * the connect_expired work will do anything.  The rest will see
+		 * that it's already queued and do nothing.
+		 */
+		delay += msecs_to_jiffies(o2net_idle_timeout(NULL));
+		queue_delayed_work(o2net_wq, &nn->nn_connect_expired, delay);
 	}
 
 	/* keep track of the nn's sc ref for the caller */
@@ -1193,6 +1198,7 @@ static int o2net_check_handshake(struct o2net_sock_container *sc)
 	 * shut down already */
 	if (nn->nn_sc == sc) {
 		o2net_sc_reset_idle_timer(sc);
+		atomic_set(&nn->nn_timeout, 0);
 		o2net_set_nn_state(nn, sc, 1, 0);
 	}
 	spin_unlock(&nn->nn_lock);
@@ -1391,6 +1397,7 @@ static void o2net_sc_send_keep_req(struct work_struct *work)
 static void o2net_idle_timer(unsigned long data)
 {
 	struct o2net_sock_container *sc = (struct o2net_sock_container *)data;
+	struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num);
 	struct timeval now;
 
 	do_gettimeofday(&now);
@@ -1413,6 +1420,12 @@ static void o2net_idle_timer(unsigned long data)
 	     sc->sc_tv_func_start.tv_sec, (long) sc->sc_tv_func_start.tv_usec,
 	     sc->sc_tv_func_stop.tv_sec, (long) sc->sc_tv_func_stop.tv_usec);
 
+	/*
+	 * Initialize the nn_timeout so that the next connection attempt
+	 * will continue in o2net_start_connect.
+	 */
+	atomic_set(&nn->nn_timeout, 1);
+
 	o2net_sc_queue_work(sc, &sc->sc_shutdown_work);
 }
 
@@ -1447,6 +1460,7 @@ static void o2net_start_connect(struct work_struct *work)
 	struct socket *sock = NULL;
 	struct sockaddr_in myaddr = {0, }, remoteaddr = {0, };
 	int ret = 0, stop;
+	unsigned int timeout;
 
 	/* if we're greater we initiate tx, otherwise we accept */
 	if (o2nm_this_node() <= o2net_num_from_nn(nn))
@@ -1466,8 +1480,17 @@ static void o2net_start_connect(struct work_struct *work)
 	}
 
 	spin_lock(&nn->nn_lock);
-	/* see if we already have one pending or have given up */
-	stop = (nn->nn_sc || nn->nn_persistent_error);
+	/*
+	 * see if we already have one pending or have given up.
+	 * For nn_timeout, it is set when we close the connection
+	 * because of the idle time out. So it means that we have
+	 * at least connected to that node successfully once,
+	 * now try to connect to it again.
+	 */
+	timeout = atomic_read(&nn->nn_timeout);
+	stop = (nn->nn_sc ||
+		(nn->nn_persistent_error &&
+		(nn->nn_persistent_error != -ENOTCONN || timeout == 0)));
 	spin_unlock(&nn->nn_lock);
 	if (stop)
 		goto out;
@@ -1579,6 +1602,7 @@ void o2net_disconnect_node(struct o2nm_node *node)
 
 	/* don't reconnect until it's heartbeating again */
 	spin_lock(&nn->nn_lock);
+	atomic_set(&nn->nn_timeout, 0);
 	o2net_set_nn_state(nn, NULL, 0, -ENOTCONN);
 	spin_unlock(&nn->nn_lock);
 
@@ -1613,17 +1637,12 @@ static void o2net_hb_node_up_cb(struct o2nm_node *node, int node_num,
 		(msecs_to_jiffies(o2net_reconnect_delay(node)) + 1);
 
 	if (node_num != o2nm_this_node()) {
-		/* heartbeat doesn't work unless a local node number is
-		 * configured and doing so brings up the o2net_wq, so we can
-		 * use it.. */
-		queue_delayed_work(o2net_wq, &nn->nn_connect_expired,
-		                   msecs_to_jiffies(o2net_idle_timeout(node)));
-
 		/* believe it or not, accept and node hearbeating testing
 		 * can succeed for this node before we got here.. so
 		 * only use set_nn_state to clear the persistent error
 		 * if that hasn't already happened */
 		spin_lock(&nn->nn_lock);
+		atomic_set(&nn->nn_timeout, 0);
 		if (nn->nn_persistent_error)
 			o2net_set_nn_state(nn, NULL, 0, 0);
 		spin_unlock(&nn->nn_lock);
@@ -1747,6 +1766,7 @@ static int o2net_accept_one(struct socket *sock)
 	new_sock = NULL;
 
 	spin_lock(&nn->nn_lock);
+	atomic_set(&nn->nn_timeout, 0);
 	o2net_set_nn_state(nn, sc, 0, 0);
 	spin_unlock(&nn->nn_lock);
 
@@ -1941,6 +1961,7 @@ int o2net_init(void)
 	for (i = 0; i < ARRAY_SIZE(o2net_nodes); i++) {
 		struct o2net_node *nn = o2net_nn_from_num(i);
 
+		atomic_set(&nn->nn_timeout, 0);
 		spin_lock_init(&nn->nn_lock);
 		INIT_DELAYED_WORK(&nn->nn_connect_work, o2net_start_connect);
 		INIT_DELAYED_WORK(&nn->nn_connect_expired,
diff --git a/fs/ocfs2/cluster/tcp_internal.h b/fs/ocfs2/cluster/tcp_internal.h
index d25b9af..b4c5586 100644
--- a/fs/ocfs2/cluster/tcp_internal.h
+++ b/fs/ocfs2/cluster/tcp_internal.h
@@ -95,6 +95,8 @@ struct o2net_node {
 	unsigned			nn_sc_valid:1;
 	/* if this is set tx just returns it */
 	int				nn_persistent_error;
+	/* It is only set to 1 after the idle time out. */
+	atomic_t			nn_timeout;
 
 	/* threads waiting for an sc to arrive wait on the wq for generation
 	 * to increase.  it is increased when a connecting socket succeeds
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 50/62] sysfs: Allow removal of symlinks in the sysfs root
  2008-04-02 20:14                                                                                                 ` [Ocfs2-devel] [PATCH 49/62] ocfs2: Reconnect after idle time out Mark Fasheh
@ 2008-04-02 20:15                                                                                                   ` Mark Fasheh
  2008-04-02 20:15                                                                                                     ` [Ocfs2-devel] [PATCH 51/62] ocfs2: Move /sys/o2cb to /sys/fs/o2cb Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

Allow callers of sysfs_remove_link() to pass a NULL kobj, in which case
sysfs_root will be used as the parent directory. This allows us to tear down
top level symlinks created via sysfs_create_link(), which already has
similar handling of a NULL parent object.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 fs/sysfs/symlink.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 5f66c44..817f596 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -87,7 +87,14 @@ int sysfs_create_link(struct kobject * kobj, struct kobject * target, const char
 
 void sysfs_remove_link(struct kobject * kobj, const char * name)
 {
-	sysfs_hash_and_remove(kobj->sd, name);
+	struct sysfs_dirent *parent_sd = NULL;
+
+	if (!kobj)
+		parent_sd = &sysfs_root;
+	else
+		parent_sd = kobj->sd;
+
+	sysfs_hash_and_remove(parent_sd, name);
 }
 
 static int sysfs_get_target_path(struct sysfs_dirent *parent_sd,
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 51/62] ocfs2: Move /sys/o2cb to /sys/fs/o2cb
  2008-04-02 20:15                                                                                                   ` [Ocfs2-devel] [PATCH 50/62] sysfs: Allow removal of symlinks in the sysfs root Mark Fasheh
@ 2008-04-02 20:15                                                                                                     ` Mark Fasheh
  2008-04-02 20:15                                                                                                       ` [Ocfs2-devel] [PATCH 52/62] ocfs2: Add support for cross extent block Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker

/sys/fs is where we really want file system specific sysfs objects.

Ocfs2-tools has been updated to look in /sys/fs/o2cb. We can maintain
backwards compatibility with old ocfs2-tools by using a sysfs symlink. After
some time (2 years), the symlink can be safely removed. This patch also adds
documentation to make it easier for people to figure out what /sys/fs/o2cb
is used for.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 Documentation/ABI/obsolete/o2cb            |   11 +++++++++++
 Documentation/ABI/stable/o2cb              |   10 ++++++++++
 Documentation/feature-removal-schedule.txt |   10 ++++++++++
 fs/ocfs2/cluster/sys.c                     |    9 +++++++++
 4 files changed, 40 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/obsolete/o2cb
 create mode 100644 Documentation/ABI/stable/o2cb

diff --git a/Documentation/ABI/obsolete/o2cb b/Documentation/ABI/obsolete/o2cb
new file mode 100644
index 0000000..9c49d8e
--- /dev/null
+++ b/Documentation/ABI/obsolete/o2cb
@@ -0,0 +1,11 @@
+What:		/sys/o2cb symlink
+Date:		Dec 2005
+KernelVersion:	2.6.16
+Contact:	ocfs2-devel at oss.oracle.com
+Description:	This is a symlink: /sys/o2cb to /sys/fs/o2cb. The symlink will
+		be removed when new versions of ocfs2-tools which know to look
+		in /sys/fs/o2cb are sufficiently prevalent. Don't code new
+		software to look here, it should try /sys/fs/o2cb instead.
+		See Documentation/ABI/stable/o2cb for more information on usage.
+Users:		ocfs2-tools. It's sufficient to mail proposed changes to
+		ocfs2-devel at oss.oracle.com.
diff --git a/Documentation/ABI/stable/o2cb b/Documentation/ABI/stable/o2cb
new file mode 100644
index 0000000..5eb1545
--- /dev/null
+++ b/Documentation/ABI/stable/o2cb
@@ -0,0 +1,10 @@
+What:		/sys/fs/o2cb/ (was /sys/o2cb)
+Date:		Dec 2005
+KernelVersion:	2.6.16
+Contact:	ocfs2-devel at oss.oracle.com
+Description:	Ocfs2-tools looks at 'interface-revision' for versioning
+		information. Each logmask/ file controls a set of debug prints
+		and can be written into with the strings "allow", "deny", or
+		"off". Reading the file returns the current state.
+Users:		ocfs2-tools. It's sufficient to mail proposed changes to
+		ocfs2-devel at oss.oracle.com.
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index bf0e3df..4101f1f 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -318,3 +318,13 @@ Why:	Not used in-tree. The current out-of-tree users used it to
 	code / infrastructure should be in the kernel and not in some
 	out-of-tree driver.
 Who:	Thomas Gleixner <tglx@linutronix.de>
+
+---------------------------
+
+What:	/sys/o2cb symlink
+When:	January 2010
+Why:	/sys/fs/o2cb is the proper location for this information - /sys/o2cb
+	exists as a symlink for backwards compatibility for old versions of
+	ocfs2-tools. 2 years should be sufficient time to phase in new versions
+	which know to look in /sys/fs/o2cb.
+Who:	ocfs2-devel at oss.oracle.com
diff --git a/fs/ocfs2/cluster/sys.c b/fs/ocfs2/cluster/sys.c
index 0c095ce..98429fd 100644
--- a/fs/ocfs2/cluster/sys.c
+++ b/fs/ocfs2/cluster/sys.c
@@ -57,6 +57,7 @@ static struct kset *o2cb_kset;
 void o2cb_sys_shutdown(void)
 {
 	mlog_sys_shutdown();
+	sysfs_remove_link(NULL, "o2cb");
 	kset_unregister(o2cb_kset);
 }
 
@@ -68,6 +69,14 @@ int o2cb_sys_init(void)
 	if (!o2cb_kset)
 		return -ENOMEM;
 
+	/*
+	 * Create this symlink for backwards compatibility with old
+	 * versions of ocfs2-tools which look for things in /sys/o2cb.
+	 */
+	ret = sysfs_create_link(NULL, &o2cb_kset->kobj, "o2cb");
+	if (ret)
+		goto error;
+
 	ret = sysfs_create_group(&o2cb_kset->kobj, &o2cb_attr_group);
 	if (ret)
 		goto error;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 52/62] ocfs2: Add support for cross extent block
  2008-04-02 20:15                                                                                                     ` [Ocfs2-devel] [PATCH 51/62] ocfs2: Move /sys/o2cb to /sys/fs/o2cb Mark Fasheh
@ 2008-04-02 20:15                                                                                                       ` Mark Fasheh
  2008-04-02 20:15                                                                                                         ` [Ocfs2-devel] [PATCH 53/62] ocfs2: Enable cross extent block merge Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Tao Ma

From: Tao Ma <tao.ma@oracle.com>

In ocfs2_merge_rec_left, when we find the merge extent is "CONTIG_RIGHT"
with the first extent record of the next extent block, we will merge it to
the next extent block and change all the related extent blocks accordingly.

In ocfs2_merge_rec_right, when we find the merge extent is "CONTIG_LEFT"
with the last extent record of the previous extent block, we will merge
it to the prevoius extent block and change all the related extent blocks
accordingly.

As for CONTIG_LEFTRIGHT, we will handle CONTIG_RIGHT first so that when
the index is zero, the merge process will be more efficient and easier.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/alloc.c |  366 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 325 insertions(+), 41 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 447206e..f63cb32 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -1450,6 +1450,8 @@ static void ocfs2_adjust_root_records(struct ocfs2_extent_list *root_el,
  *   - When our insert into the right path leaf is at the leftmost edge
  *     and requires an update of the path immediately to it's left. This
  *     can occur at the end of some types of rotation and appending inserts.
+ *   - When we've adjusted the last extent record in the left path leaf and the
+ *     1st extent record in the right path leaf during cross extent block merge.
  */
 static void ocfs2_complete_edge_insert(struct inode *inode, handle_t *handle,
 				       struct ocfs2_path *left_path,
@@ -2712,24 +2714,147 @@ static void ocfs2_cleanup_merge(struct ocfs2_extent_list *el,
 	}
 }
 
+static int ocfs2_get_right_path(struct inode *inode,
+				struct ocfs2_path *left_path,
+				struct ocfs2_path **ret_right_path)
+{
+	int ret;
+	u32 right_cpos;
+	struct ocfs2_path *right_path = NULL;
+	struct ocfs2_extent_list *left_el;
+
+	*ret_right_path = NULL;
+
+	/* This function shouldn't be called for non-trees. */
+	BUG_ON(left_path->p_tree_depth == 0);
+
+	left_el = path_leaf_el(left_path);
+	BUG_ON(left_el->l_next_free_rec != left_el->l_count);
+
+	ret = ocfs2_find_cpos_for_right_leaf(inode->i_sb, left_path,
+					     &right_cpos);
+	if (ret) {
+		mlog_errno(ret);
+		goto out;
+	}
+
+	/* This function shouldn't be called for the rightmost leaf. */
+	BUG_ON(right_cpos == 0);
+
+	right_path = ocfs2_new_path(path_root_bh(left_path),
+				    path_root_el(left_path));
+	if (!right_path) {
+		ret = -ENOMEM;
+		mlog_errno(ret);
+		goto out;
+	}
+
+	ret = ocfs2_find_path(inode, right_path, right_cpos);
+	if (ret) {
+		mlog_errno(ret);
+		goto out;
+	}
+
+	*ret_right_path = right_path;
+out:
+	if (ret)
+		ocfs2_free_path(right_path);
+	return ret;
+}
+
 /*
  * Remove split_rec clusters from the record at index and merge them
- * onto the beginning of the record at index + 1.
+ * onto the beginning of the record "next" to it.
+ * For index < l_count - 1, the next means the extent rec@index + 1.
+ * For index == l_count - 1, the "next" means the 1st extent rec of the
+ * next extent block.
  */
-static int ocfs2_merge_rec_right(struct inode *inode, struct buffer_head *bh,
-				handle_t *handle,
-				struct ocfs2_extent_rec *split_rec,
-				struct ocfs2_extent_list *el, int index)
+static int ocfs2_merge_rec_right(struct inode *inode,
+				 struct ocfs2_path *left_path,
+				 handle_t *handle,
+				 struct ocfs2_extent_rec *split_rec,
+				 int index)
 {
-	int ret;
+	int ret, next_free, i;
 	unsigned int split_clusters = le16_to_cpu(split_rec->e_leaf_clusters);
 	struct ocfs2_extent_rec *left_rec;
 	struct ocfs2_extent_rec *right_rec;
+	struct ocfs2_extent_list *right_el;
+	struct ocfs2_path *right_path = NULL;
+	int subtree_index = 0;
+	struct ocfs2_extent_list *el = path_leaf_el(left_path);
+	struct buffer_head *bh = path_leaf_bh(left_path);
+	struct buffer_head *root_bh = NULL;
 
 	BUG_ON(index >= le16_to_cpu(el->l_next_free_rec));
-
 	left_rec = &el->l_recs[index];
-	right_rec = &el->l_recs[index + 1];
+
+	if (index == le16_to_cpu(el->l_next_free_rec - 1) &&
+	    le16_to_cpu(el->l_next_free_rec) == le16_to_cpu(el->l_count)) {
+		/* we meet with a cross extent block merge. */
+		ret = ocfs2_get_right_path(inode, left_path, &right_path);
+		if (ret) {
+			mlog_errno(ret);
+			goto out;
+		}
+
+		right_el = path_leaf_el(right_path);
+		next_free = le16_to_cpu(right_el->l_next_free_rec);
+		BUG_ON(next_free <= 0);
+		right_rec = &right_el->l_recs[0];
+		if (ocfs2_is_empty_extent(right_rec)) {
+			BUG_ON(le16_to_cpu(next_free) <= 1);
+			right_rec = &right_el->l_recs[1];
+		}
+
+		BUG_ON(le32_to_cpu(left_rec->e_cpos) +
+		       le16_to_cpu(left_rec->e_leaf_clusters) !=
+		       le32_to_cpu(right_rec->e_cpos));
+
+		subtree_index = ocfs2_find_subtree_root(inode,
+							left_path, right_path);
+
+		ret = ocfs2_extend_rotate_transaction(handle, subtree_index,
+						      handle->h_buffer_credits,
+						      right_path);
+		if (ret) {
+			mlog_errno(ret);
+			goto out;
+		}
+
+		root_bh = left_path->p_node[subtree_index].bh;
+		BUG_ON(root_bh != right_path->p_node[subtree_index].bh);
+
+		ret = ocfs2_journal_access(handle, inode, root_bh,
+					   OCFS2_JOURNAL_ACCESS_WRITE);
+		if (ret) {
+			mlog_errno(ret);
+			goto out;
+		}
+
+		for (i = subtree_index + 1;
+		     i < path_num_items(right_path); i++) {
+			ret = ocfs2_journal_access(handle, inode,
+						   right_path->p_node[i].bh,
+						   OCFS2_JOURNAL_ACCESS_WRITE);
+			if (ret) {
+				mlog_errno(ret);
+				goto out;
+			}
+
+			ret = ocfs2_journal_access(handle, inode,
+						   left_path->p_node[i].bh,
+						   OCFS2_JOURNAL_ACCESS_WRITE);
+			if (ret) {
+				mlog_errno(ret);
+				goto out;
+			}
+		}
+
+	} else {
+		BUG_ON(index == le16_to_cpu(el->l_next_free_rec) - 1);
+		right_rec = &el->l_recs[index + 1];
+	}
 
 	ret = ocfs2_journal_access(handle, inode, bh,
 				   OCFS2_JOURNAL_ACCESS_WRITE);
@@ -2751,30 +2876,156 @@ static int ocfs2_merge_rec_right(struct inode *inode, struct buffer_head *bh,
 	if (ret)
 		mlog_errno(ret);
 
+	if (right_path) {
+		ret = ocfs2_journal_dirty(handle, path_leaf_bh(right_path));
+		if (ret)
+			mlog_errno(ret);
+
+		ocfs2_complete_edge_insert(inode, handle, left_path,
+					   right_path, subtree_index);
+	}
+out:
+	if (right_path)
+		ocfs2_free_path(right_path);
+	return ret;
+}
+
+static int ocfs2_get_left_path(struct inode *inode,
+			       struct ocfs2_path *right_path,
+			       struct ocfs2_path **ret_left_path)
+{
+	int ret;
+	u32 left_cpos;
+	struct ocfs2_path *left_path = NULL;
+
+	*ret_left_path = NULL;
+
+	/* This function shouldn't be called for non-trees. */
+	BUG_ON(right_path->p_tree_depth == 0);
+
+	ret = ocfs2_find_cpos_for_left_leaf(inode->i_sb,
+					    right_path, &left_cpos);
+	if (ret) {
+		mlog_errno(ret);
+		goto out;
+	}
+
+	/* This function shouldn't be called for the leftmost leaf. */
+	BUG_ON(left_cpos == 0);
+
+	left_path = ocfs2_new_path(path_root_bh(right_path),
+				   path_root_el(right_path));
+	if (!left_path) {
+		ret = -ENOMEM;
+		mlog_errno(ret);
+		goto out;
+	}
+
+	ret = ocfs2_find_path(inode, left_path, left_cpos);
+	if (ret) {
+		mlog_errno(ret);
+		goto out;
+	}
+
+	*ret_left_path = left_path;
 out:
+	if (ret)
+		ocfs2_free_path(left_path);
 	return ret;
 }
 
 /*
  * Remove split_rec clusters from the record at index and merge them
- * onto the tail of the record at index - 1.
+ * onto the tail of the record "before" it.
+ * For index > 0, the "before" means the extent rec at index - 1.
+ *
+ * For index == 0, the "before" means the last record of the previous
+ * extent block. And there is also a situation that we may need to
+ * remove the rightmost leaf extent block in the right_path and change
+ * the right path to indicate the new rightmost path.
  */
-static int ocfs2_merge_rec_left(struct inode *inode, struct buffer_head *bh,
+static int ocfs2_merge_rec_left(struct inode *inode,
+				struct ocfs2_path *right_path,
 				handle_t *handle,
 				struct ocfs2_extent_rec *split_rec,
-				struct ocfs2_extent_list *el, int index)
+				struct ocfs2_cached_dealloc_ctxt *dealloc,
+				int index)
 {
-	int ret, has_empty_extent = 0;
+	int ret, i, subtree_index = 0, has_empty_extent = 0;
 	unsigned int split_clusters = le16_to_cpu(split_rec->e_leaf_clusters);
 	struct ocfs2_extent_rec *left_rec;
 	struct ocfs2_extent_rec *right_rec;
+	struct ocfs2_extent_list *el = path_leaf_el(right_path);
+	struct buffer_head *bh = path_leaf_bh(right_path);
+	struct buffer_head *root_bh = NULL;
+	struct ocfs2_path *left_path = NULL;
+	struct ocfs2_extent_list *left_el;
 
-	BUG_ON(index <= 0);
+	BUG_ON(index < 0);
 
-	left_rec = &el->l_recs[index - 1];
 	right_rec = &el->l_recs[index];
-	if (ocfs2_is_empty_extent(&el->l_recs[0]))
-		has_empty_extent = 1;
+	if (index == 0) {
+		/* we meet with a cross extent block merge. */
+		ret = ocfs2_get_left_path(inode, right_path, &left_path);
+		if (ret) {
+			mlog_errno(ret);
+			goto out;
+		}
+
+		left_el = path_leaf_el(left_path);
+		BUG_ON(le16_to_cpu(left_el->l_next_free_rec) !=
+		       le16_to_cpu(left_el->l_count));
+
+		left_rec = &left_el->l_recs[
+				le16_to_cpu(left_el->l_next_free_rec) - 1];
+		BUG_ON(le32_to_cpu(left_rec->e_cpos) +
+		       le16_to_cpu(left_rec->e_leaf_clusters) !=
+		       le32_to_cpu(split_rec->e_cpos));
+
+		subtree_index = ocfs2_find_subtree_root(inode,
+							left_path, right_path);
+
+		ret = ocfs2_extend_rotate_transaction(handle, subtree_index,
+						      handle->h_buffer_credits,
+						      left_path);
+		if (ret) {
+			mlog_errno(ret);
+			goto out;
+		}
+
+		root_bh = left_path->p_node[subtree_index].bh;
+		BUG_ON(root_bh != right_path->p_node[subtree_index].bh);
+
+		ret = ocfs2_journal_access(handle, inode, root_bh,
+					   OCFS2_JOURNAL_ACCESS_WRITE);
+		if (ret) {
+			mlog_errno(ret);
+			goto out;
+		}
+
+		for (i = subtree_index + 1;
+		     i < path_num_items(right_path); i++) {
+			ret = ocfs2_journal_access(handle, inode,
+						   right_path->p_node[i].bh,
+						   OCFS2_JOURNAL_ACCESS_WRITE);
+			if (ret) {
+				mlog_errno(ret);
+				goto out;
+			}
+
+			ret = ocfs2_journal_access(handle, inode,
+						   left_path->p_node[i].bh,
+						   OCFS2_JOURNAL_ACCESS_WRITE);
+			if (ret) {
+				mlog_errno(ret);
+				goto out;
+			}
+		}
+	} else {
+		left_rec = &el->l_recs[index - 1];
+		if (ocfs2_is_empty_extent(&el->l_recs[0]))
+			has_empty_extent = 1;
+	}
 
 	ret = ocfs2_journal_access(handle, inode, bh,
 				   OCFS2_JOURNAL_ACCESS_WRITE);
@@ -2790,9 +3041,8 @@ static int ocfs2_merge_rec_left(struct inode *inode, struct buffer_head *bh,
 		*left_rec = *split_rec;
 
 		has_empty_extent = 0;
-	} else {
+	} else
 		le16_add_cpu(&left_rec->e_leaf_clusters, split_clusters);
-	}
 
 	le32_add_cpu(&right_rec->e_cpos, split_clusters);
 	le64_add_cpu(&right_rec->e_blkno,
@@ -2805,13 +3055,44 @@ static int ocfs2_merge_rec_left(struct inode *inode, struct buffer_head *bh,
 	if (ret)
 		mlog_errno(ret);
 
+	if (left_path) {
+		ret = ocfs2_journal_dirty(handle, path_leaf_bh(left_path));
+		if (ret)
+			mlog_errno(ret);
+
+		/*
+		 * In the situation that the right_rec is empty and the extent
+		 * block is empty also,  ocfs2_complete_edge_insert can't handle
+		 * it and we need to delete the right extent block.
+		 */
+		if (le16_to_cpu(right_rec->e_leaf_clusters) == 0 &&
+		    le16_to_cpu(el->l_next_free_rec) == 1) {
+
+			ret = ocfs2_remove_rightmost_path(inode, handle,
+							  right_path, dealloc);
+			if (ret) {
+				mlog_errno(ret);
+				goto out;
+			}
+
+			/* Now the rightmost extent block has been deleted.
+			 * So we use the new rightmost path.
+			 */
+			ocfs2_mv_path(right_path, left_path);
+			left_path = NULL;
+		} else
+			ocfs2_complete_edge_insert(inode, handle, left_path,
+						   right_path, subtree_index);
+	}
 out:
+	if (left_path)
+		ocfs2_free_path(left_path);
 	return ret;
 }
 
 static int ocfs2_try_to_merge_extent(struct inode *inode,
 				     handle_t *handle,
-				     struct ocfs2_path *left_path,
+				     struct ocfs2_path *path,
 				     int split_index,
 				     struct ocfs2_extent_rec *split_rec,
 				     struct ocfs2_cached_dealloc_ctxt *dealloc,
@@ -2819,7 +3100,7 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 
 {
 	int ret = 0;
-	struct ocfs2_extent_list *el = path_leaf_el(left_path);
+	struct ocfs2_extent_list *el = path_leaf_el(path);
 	struct ocfs2_extent_rec *rec = &el->l_recs[split_index];
 
 	BUG_ON(ctxt->c_contig_type == CONTIG_NONE);
@@ -2832,7 +3113,7 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 		 * extents - having more than one in a leaf is
 		 * illegal.
 		 */
-		ret = ocfs2_rotate_tree_left(inode, handle, left_path,
+		ret = ocfs2_rotate_tree_left(inode, handle, path,
 					     dealloc);
 		if (ret) {
 			mlog_errno(ret);
@@ -2847,7 +3128,6 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 		 * Left-right contig implies this.
 		 */
 		BUG_ON(!ctxt->c_split_covers_rec);
-		BUG_ON(split_index == 0);
 
 		/*
 		 * Since the leftright insert always covers the entire
@@ -2858,9 +3138,14 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 		 * Since the adding of an empty extent shifts
 		 * everything back to the right, there's no need to
 		 * update split_index here.
+		 *
+		 * When the split_index is zero, we need to merge it to the
+		 * prevoius extent block. It is more efficient and easier
+		 * if we do merge_right first and merge_left later.
 		 */
-		ret = ocfs2_merge_rec_left(inode, path_leaf_bh(left_path),
-					   handle, split_rec, el, split_index);
+		ret = ocfs2_merge_rec_right(inode, path,
+					    handle, split_rec,
+					    split_index);
 		if (ret) {
 			mlog_errno(ret);
 			goto out;
@@ -2871,32 +3156,30 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 		 */
 		BUG_ON(!ocfs2_is_empty_extent(&el->l_recs[0]));
 
-		/*
-		 * The left merge left us with an empty extent, remove
-		 * it.
-		 */
-		ret = ocfs2_rotate_tree_left(inode, handle, left_path, dealloc);
+		/* The merge left us with an empty extent, remove it. */
+		ret = ocfs2_rotate_tree_left(inode, handle, path, dealloc);
 		if (ret) {
 			mlog_errno(ret);
 			goto out;
 		}
-		split_index--;
+
 		rec = &el->l_recs[split_index];
 
 		/*
 		 * Note that we don't pass split_rec here on purpose -
-		 * we've merged it into the left side.
+		 * we've merged it into the rec already.
 		 */
-		ret = ocfs2_merge_rec_right(inode, path_leaf_bh(left_path),
-					    handle, rec, el, split_index);
+		ret = ocfs2_merge_rec_left(inode, path,
+					   handle, rec,
+					   dealloc,
+					   split_index);
+
 		if (ret) {
 			mlog_errno(ret);
 			goto out;
 		}
 
-		BUG_ON(!ocfs2_is_empty_extent(&el->l_recs[0]));
-
-		ret = ocfs2_rotate_tree_left(inode, handle, left_path,
+		ret = ocfs2_rotate_tree_left(inode, handle, path,
 					     dealloc);
 		/*
 		 * Error from this last rotate is not critical, so
@@ -2915,8 +3198,9 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 		 */
 		if (ctxt->c_contig_type == CONTIG_RIGHT) {
 			ret = ocfs2_merge_rec_left(inode,
-						   path_leaf_bh(left_path),
-						   handle, split_rec, el,
+						   path,
+						   handle, split_rec,
+						   dealloc,
 						   split_index);
 			if (ret) {
 				mlog_errno(ret);
@@ -2924,8 +3208,8 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 			}
 		} else {
 			ret = ocfs2_merge_rec_right(inode,
-						    path_leaf_bh(left_path),
-						    handle, split_rec, el,
+						    path,
+						    handle, split_rec,
 						    split_index);
 			if (ret) {
 				mlog_errno(ret);
@@ -2938,7 +3222,7 @@ static int ocfs2_try_to_merge_extent(struct inode *inode,
 			 * The merge may have left an empty extent in
 			 * our leaf. Try to rotate it away.
 			 */
-			ret = ocfs2_rotate_tree_left(inode, handle, left_path,
+			ret = ocfs2_rotate_tree_left(inode, handle, path,
 						     dealloc);
 			if (ret)
 				mlog_errno(ret);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 53/62] ocfs2: Enable cross extent block merge.
  2008-04-02 20:15                                                                                                       ` [Ocfs2-devel] [PATCH 52/62] ocfs2: Add support for cross extent block Mark Fasheh
@ 2008-04-02 20:15                                                                                                         ` Mark Fasheh
  2008-04-02 20:15                                                                                                           ` [Ocfs2-devel] [PATCH 54/62] ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Tao Ma

From: Tao Ma <tao.ma@oracle.com>

In ocfs2_figure_merge_contig_type, we judge whether there exists
a cross extent block merge and enable it by setting CONTIG_LEFT
and CONTIG_RIGHT accordingly.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/alloc.c |   94 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index f63cb32..7d81aa6 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -3782,20 +3782,57 @@ out:
 }
 
 static enum ocfs2_contig_type
-ocfs2_figure_merge_contig_type(struct inode *inode,
+ocfs2_figure_merge_contig_type(struct inode *inode, struct ocfs2_path *path,
 			       struct ocfs2_extent_list *el, int index,
 			       struct ocfs2_extent_rec *split_rec)
 {
-	struct ocfs2_extent_rec *rec;
+	int status;
 	enum ocfs2_contig_type ret = CONTIG_NONE;
+	u32 left_cpos, right_cpos;
+	struct ocfs2_extent_rec *rec = NULL;
+	struct ocfs2_extent_list *new_el;
+	struct ocfs2_path *left_path = NULL, *right_path = NULL;
+	struct buffer_head *bh;
+	struct ocfs2_extent_block *eb;
+
+	if (index > 0) {
+		rec = &el->l_recs[index - 1];
+	} else if (path->p_tree_depth > 0) {
+		status = ocfs2_find_cpos_for_left_leaf(inode->i_sb,
+						       path, &left_cpos);
+		if (status)
+			goto out;
+
+		if (left_cpos != 0) {
+			left_path = ocfs2_new_path(path_root_bh(path),
+						   path_root_el(path));
+			if (!left_path)
+				goto out;
+
+			status = ocfs2_find_path(inode, left_path, left_cpos);
+			if (status)
+				goto out;
+
+			new_el = path_leaf_el(left_path);
+
+			if (le16_to_cpu(new_el->l_next_free_rec) !=
+			    le16_to_cpu(new_el->l_count)) {
+				bh = path_leaf_bh(left_path);
+				eb = (struct ocfs2_extent_block *)bh->b_data;
+				OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb,
+								 eb);
+				goto out;
+			}
+			rec = &new_el->l_recs[
+				le16_to_cpu(new_el->l_next_free_rec) - 1];
+		}
+	}
 
 	/*
 	 * We're careful to check for an empty extent record here -
 	 * the merge code will know what to do if it sees one.
 	 */
-
-	if (index > 0) {
-		rec = &el->l_recs[index - 1];
+	if (rec) {
 		if (index == 1 && ocfs2_is_empty_extent(rec)) {
 			if (split_rec->e_cpos == el->l_recs[index].e_cpos)
 				ret = CONTIG_RIGHT;
@@ -3804,10 +3841,45 @@ ocfs2_figure_merge_contig_type(struct inode *inode,
 		}
 	}
 
-	if (index < (le16_to_cpu(el->l_next_free_rec) - 1)) {
+	rec = NULL;
+	if (index < (le16_to_cpu(el->l_next_free_rec) - 1))
+		rec = &el->l_recs[index + 1];
+	else if (le16_to_cpu(el->l_next_free_rec) == le16_to_cpu(el->l_count) &&
+		 path->p_tree_depth > 0) {
+		status = ocfs2_find_cpos_for_right_leaf(inode->i_sb,
+							path, &right_cpos);
+		if (status)
+			goto out;
+
+		if (right_cpos == 0)
+			goto out;
+
+		right_path = ocfs2_new_path(path_root_bh(path),
+					    path_root_el(path));
+		if (!right_path)
+			goto out;
+
+		status = ocfs2_find_path(inode, right_path, right_cpos);
+		if (status)
+			goto out;
+
+		new_el = path_leaf_el(right_path);
+		rec = &new_el->l_recs[0];
+		if (ocfs2_is_empty_extent(rec)) {
+			if (le16_to_cpu(new_el->l_next_free_rec) <= 1) {
+				bh = path_leaf_bh(right_path);
+				eb = (struct ocfs2_extent_block *)bh->b_data;
+				OCFS2_RO_ON_INVALID_EXTENT_BLOCK(inode->i_sb,
+								 eb);
+				goto out;
+			}
+			rec = &new_el->l_recs[1];
+		}
+	}
+
+	if (rec) {
 		enum ocfs2_contig_type contig_type;
 
-		rec = &el->l_recs[index + 1];
 		contig_type = ocfs2_extent_contig(inode, rec, split_rec);
 
 		if (contig_type == CONTIG_LEFT && ret == CONTIG_RIGHT)
@@ -3816,6 +3888,12 @@ ocfs2_figure_merge_contig_type(struct inode *inode,
 			ret = contig_type;
 	}
 
+out:
+	if (left_path)
+		ocfs2_free_path(left_path);
+	if (right_path)
+		ocfs2_free_path(right_path);
+
 	return ret;
 }
 
@@ -4278,7 +4356,7 @@ static int __ocfs2_mark_extent_written(struct inode *inode,
 		goto out;
 	}
 
-	ctxt.c_contig_type = ocfs2_figure_merge_contig_type(inode, el,
+	ctxt.c_contig_type = ocfs2_figure_merge_contig_type(inode, path, el,
 							    split_index,
 							    split_rec);
 
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 54/62] ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
  2008-04-02 20:15                                                                                                         ` [Ocfs2-devel] [PATCH 53/62] ocfs2: Enable cross extent block merge Mark Fasheh
@ 2008-04-02 20:15                                                                                                           ` Mark Fasheh
  2008-04-02 20:15                                                                                                             ` [Ocfs2-devel] [PATCH 55/62] ocfs2: Add ac_alloc_slot in ocfs2_alloc_context Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Tao Ma

From: Tao Ma <tao.ma@oracle.com>

In some cases(Inode stealing from other nodes), we may not want
ocfs2_reserve_suballoc_bits to allocate new groups from the
global_bitmap since it may already be full. So add a new parameter
for this.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/suballoc.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 72c198a..3be4e73 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -46,6 +46,9 @@
 
 #include "buffer_head_io.h"
 
+#define NOT_ALLOC_NEW_GROUP		0
+#define ALLOC_NEW_GROUP			1
+
 static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg);
 static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe);
 static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl);
@@ -391,7 +394,8 @@ bail:
 static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 				       struct ocfs2_alloc_context *ac,
 				       int type,
-				       u32 slot)
+				       u32 slot,
+				       int alloc_new_group)
 {
 	int status;
 	u32 bits_wanted = ac->ac_bits_wanted;
@@ -446,6 +450,14 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 			goto bail;
 		}
 
+		if (alloc_new_group != ALLOC_NEW_GROUP) {
+			mlog(0, "Alloc File %u Full: wanted=%u, free_bits=%u, "
+			     "and we don't alloc a new group for it.\n",
+			     slot, bits_wanted, free_bits);
+			status = -ENOSPC;
+			goto bail;
+		}
+
 		status = ocfs2_block_group_alloc(osb, alloc_inode, bh);
 		if (status < 0) {
 			if (status != -ENOSPC)
@@ -490,7 +502,8 @@ int ocfs2_reserve_new_metadata(struct ocfs2_super *osb,
 	(*ac)->ac_group_search = ocfs2_block_group_search;
 
 	status = ocfs2_reserve_suballoc_bits(osb, (*ac),
-					     EXTENT_ALLOC_SYSTEM_INODE, slot);
+					     EXTENT_ALLOC_SYSTEM_INODE,
+					     slot, ALLOC_NEW_GROUP);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
@@ -527,7 +540,7 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
-					     osb->slot_num);
+					     osb->slot_num, ALLOC_NEW_GROUP);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
@@ -557,7 +570,8 @@ int ocfs2_reserve_cluster_bitmap_bits(struct ocfs2_super *osb,
 
 	status = ocfs2_reserve_suballoc_bits(osb, ac,
 					     GLOBAL_BITMAP_SYSTEM_INODE,
-					     OCFS2_INVALID_SLOT);
+					     OCFS2_INVALID_SLOT,
+					     ALLOC_NEW_GROUP);
 	if (status < 0 && status != -ENOSPC) {
 		mlog_errno(status);
 		goto bail;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 55/62] ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
  2008-04-02 20:15                                                                                                           ` [Ocfs2-devel] [PATCH 54/62] ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits Mark Fasheh
@ 2008-04-02 20:15                                                                                                             ` Mark Fasheh
  2008-04-02 20:15                                                                                                               ` [Ocfs2-devel] [PATCH 56/62] ocfs2: Add inode stealing for ocfs2_reserve_new_inode Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Tao Ma

From: Tao Ma <tao.ma@oracle.com>

In inode stealing, we no longer restrict the allocation to
happen in the local node. So it is neccessary for us to add
a new member in ocfs2_alloc_context to indicate which slot
we are using for allocation. We also modify the process of
local alloc so that this member can be used there also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/localalloc.c |    2 ++
 fs/ocfs2/suballoc.c   |    1 +
 fs/ocfs2/suballoc.h   |    1 +
 3 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index ab83fd5..b6d0719 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -523,6 +523,8 @@ int ocfs2_reserve_local_alloc_bits(struct ocfs2_super *osb,
 	}
 
 	ac->ac_inode = local_alloc_inode;
+	/* We should never use localalloc from another slot */
+	ac->ac_alloc_slot = osb->slot_num;
 	ac->ac_which = OCFS2_AC_USE_LOCAL;
 	get_bh(osb->local_alloc_bh);
 	ac->ac_bh = osb->local_alloc_bh;
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 3be4e73..33d5573 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -424,6 +424,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 	}
 
 	ac->ac_inode = alloc_inode;
+	ac->ac_alloc_slot = slot;
 
 	fe = (struct ocfs2_dinode *) bh->b_data;
 	if (!OCFS2_IS_VALID_DINODE(fe)) {
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index 8799033..544c600 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -36,6 +36,7 @@ typedef int (group_search_t)(struct inode *,
 struct ocfs2_alloc_context {
 	struct inode *ac_inode;    /* which bitmap are we allocating from? */
 	struct buffer_head *ac_bh; /* file entry bh */
+	u32    ac_alloc_slot;   /* which slot are we allocating from? */
 	u32    ac_bits_wanted;
 	u32    ac_bits_given;
 #define OCFS2_AC_USE_LOCAL 1
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 56/62] ocfs2: Add inode stealing for ocfs2_reserve_new_inode
  2008-04-02 20:15                                                                                                             ` [Ocfs2-devel] [PATCH 55/62] ocfs2: Add ac_alloc_slot in ocfs2_alloc_context Mark Fasheh
@ 2008-04-02 20:15                                                                                                               ` Mark Fasheh
  2008-04-02 20:15                                                                                                                 ` [Ocfs2-devel] [PATCH 57/62] fs/ocfs2/aops.c: test for IS_ERR rather than 0 Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Tao Ma

From: Tao Ma <tao.ma@oracle.com>

Inode allocation is modified to look in other nodes allocators during
extreme out of space situations. We retry our own slot when space is freed
back to the global bitmap, or whenever we've allocated more than 1024 inodes
from another slot.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/alloc.c      |    2 +
 fs/ocfs2/localalloc.c |    2 +
 fs/ocfs2/namei.c      |    2 +-
 fs/ocfs2/ocfs2.h      |   34 +++++++++++++++++++-
 fs/ocfs2/suballoc.c   |   80 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/ocfs2/super.c      |    1 +
 6 files changed, 116 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 7d81aa6..a268821 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5150,6 +5150,8 @@ static void ocfs2_truncate_log_worker(struct work_struct *work)
 	status = ocfs2_flush_truncate_log(osb);
 	if (status < 0)
 		mlog_errno(status);
+	else
+		ocfs2_init_inode_steal_slot(osb);
 
 	mlog_exit(status);
 }
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index b6d0719..ce0dc14 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -447,6 +447,8 @@ out_mutex:
 	iput(main_bm_inode);
 
 out:
+	if (!status)
+		ocfs2_init_inode_steal_slot(osb);
 	mlog_exit(status);
 	return status;
 }
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index ae9ad95..ab5a227 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -424,7 +424,7 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
 	fe->i_fs_generation = cpu_to_le32(osb->fs_generation);
 	fe->i_blkno = cpu_to_le64(fe_blkno);
 	fe->i_suballoc_bit = cpu_to_le16(suballoc_bit);
-	fe->i_suballoc_slot = cpu_to_le16(osb->slot_num);
+	fe->i_suballoc_slot = cpu_to_le16(inode_ac->ac_alloc_slot);
 	fe->i_uid = cpu_to_le32(current->fsuid);
 	if (dir->i_mode & S_ISGID) {
 		fe->i_gid = cpu_to_le32(dir->i_gid);
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 9ff5811..3169237 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -208,11 +208,14 @@ struct ocfs2_super
 	u32 s_feature_incompat;
 	u32 s_feature_ro_compat;
 
-	/* Protects s_next_generaion, osb_flags. Could protect more on
-	 * osb as it's very short lived. */
+	/* Protects s_next_generation, osb_flags and s_inode_steal_slot.
+	 * Could protect more on osb as it's very short lived.
+	 */
 	spinlock_t osb_lock;
 	u32 s_next_generation;
 	unsigned long osb_flags;
+	s16 s_inode_steal_slot;
+	atomic_t s_num_inodes_stolen;
 
 	unsigned long s_mount_opt;
 	unsigned int s_atime_quantum;
@@ -537,6 +540,33 @@ static inline unsigned int ocfs2_pages_per_cluster(struct super_block *sb)
 	return pages_per_cluster;
 }
 
+static inline void ocfs2_init_inode_steal_slot(struct ocfs2_super *osb)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = OCFS2_INVALID_SLOT;
+	spin_unlock(&osb->osb_lock);
+	atomic_set(&osb->s_num_inodes_stolen, 0);
+}
+
+static inline void ocfs2_set_inode_steal_slot(struct ocfs2_super *osb,
+					      s16 slot)
+{
+	spin_lock(&osb->osb_lock);
+	osb->s_inode_steal_slot = slot;
+	spin_unlock(&osb->osb_lock);
+}
+
+static inline s16 ocfs2_get_inode_steal_slot(struct ocfs2_super *osb)
+{
+	s16 slot;
+
+	spin_lock(&osb->osb_lock);
+	slot = osb->s_inode_steal_slot;
+	spin_unlock(&osb->osb_lock);
+
+	return slot;
+}
+
 #define ocfs2_set_bit ext2_set_bit
 #define ocfs2_clear_bit ext2_clear_bit
 #define ocfs2_test_bit ext2_test_bit
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 33d5573..d2d278f 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -49,6 +49,8 @@
 #define NOT_ALLOC_NEW_GROUP		0
 #define ALLOC_NEW_GROUP			1
 
+#define OCFS2_MAX_INODES_TO_STEAL	1024
+
 static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg);
 static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe);
 static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl);
@@ -109,7 +111,7 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode,
 						u64 *bg_blkno,
 						u16 *bg_bit_off);
 
-void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+static void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac)
 {
 	struct inode *inode = ac->ac_inode;
 
@@ -120,9 +122,17 @@ void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
 		mutex_unlock(&inode->i_mutex);
 
 		iput(inode);
+		ac->ac_inode = NULL;
 	}
-	if (ac->ac_bh)
+	if (ac->ac_bh) {
 		brelse(ac->ac_bh);
+		ac->ac_bh = NULL;
+	}
+}
+
+void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
+{
+	ocfs2_free_ac_resource(ac);
 	kfree(ac);
 }
 
@@ -522,10 +532,42 @@ bail:
 	return status;
 }
 
+static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
+					      struct ocfs2_alloc_context *ac)
+{
+	int i, status = -ENOSPC;
+	s16 slot = ocfs2_get_inode_steal_slot(osb);
+
+	/* Start to steal inodes from the first slot after ours. */
+	if (slot == OCFS2_INVALID_SLOT)
+		slot = osb->slot_num + 1;
+
+	for (i = 0; i < osb->max_slots; i++, slot++) {
+		if (slot == osb->max_slots)
+			slot = 0;
+
+		if (slot == osb->slot_num)
+			continue;
+
+		status = ocfs2_reserve_suballoc_bits(osb, ac,
+						     INODE_ALLOC_SYSTEM_INODE,
+						     slot, NOT_ALLOC_NEW_GROUP);
+		if (status >= 0) {
+			ocfs2_set_inode_steal_slot(osb, slot);
+			break;
+		}
+
+		ocfs2_free_ac_resource(ac);
+	}
+
+	return status;
+}
+
 int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 			    struct ocfs2_alloc_context **ac)
 {
 	int status;
+	s16 slot = ocfs2_get_inode_steal_slot(osb);
 
 	*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
 	if (!(*ac)) {
@@ -539,9 +581,43 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 
 	(*ac)->ac_group_search = ocfs2_block_group_search;
 
+	/*
+	 * slot is set when we successfully steal inode from other nodes.
+	 * It is reset in 3 places:
+	 * 1. when we flush the truncate log
+	 * 2. when we complete local alloc recovery.
+	 * 3. when we successfully allocate from our own slot.
+	 * After it is set, we will go on stealing inodes until we find the
+	 * need to check our slots to see whether there is some space for us.
+	 */
+	if (slot != OCFS2_INVALID_SLOT &&
+	    atomic_read(&osb->s_num_inodes_stolen) < OCFS2_MAX_INODES_TO_STEAL)
+		goto inode_steal;
+
+	atomic_set(&osb->s_num_inodes_stolen, 0);
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
 					     osb->slot_num, ALLOC_NEW_GROUP);
+	if (status >= 0) {
+		status = 0;
+
+		/*
+		 * Some inodes must be freed by us, so try to allocate
+		 * from our own next time.
+		 */
+		if (slot != OCFS2_INVALID_SLOT)
+			ocfs2_init_inode_steal_slot(osb);
+		goto bail;
+	} else if (status < 0 && status != -ENOSPC) {
+		mlog_errno(status);
+		goto bail;
+	}
+
+	ocfs2_free_ac_resource(*ac);
+
+inode_steal:
+	status = ocfs2_steal_inode_from_other_nodes(osb, *ac);
+	atomic_inc(&osb->s_num_inodes_stolen);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 96ebe36..df63ba2 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1394,6 +1394,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	INIT_LIST_HEAD(&osb->blocked_lock_list);
 	osb->blocked_lock_count = 0;
 	spin_lock_init(&osb->osb_lock);
+	ocfs2_init_inode_steal_slot(osb);
 
 	atomic_set(&osb->alloc_stats.moves, 0);
 	atomic_set(&osb->alloc_stats.local_data, 0);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 57/62] fs/ocfs2/aops.c: test for IS_ERR rather than 0
  2008-04-02 20:15                                                                                                               ` [Ocfs2-devel] [PATCH 56/62] ocfs2: Add inode stealing for ocfs2_reserve_new_inode Mark Fasheh
@ 2008-04-02 20:15                                                                                                                 ` Mark Fasheh
  2008-04-02 20:15                                                                                                                   ` [Ocfs2-devel] [PATCH 58/62] ocfs2: Improve rename locking Mark Fasheh
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Julia Lawall

From: Julia Lawall <julia@diku.dk>

The function ocfs2_start_trans always returns either a valid pointer or a
value made with ERR_PTR, so its result should be tested with IS_ERR, not
with a test for 0.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/aops.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 90383ed..17964c0 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -467,11 +467,11 @@ handle_t *ocfs2_start_walk_page_trans(struct inode *inode,
 							 unsigned to)
 {
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
-	handle_t *handle = NULL;
+	handle_t *handle;
 	int ret = 0;
 
 	handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
-	if (!handle) {
+	if (IS_ERR(handle)) {
 		ret = -ENOMEM;
 		mlog_errno(ret);
 		goto out;
@@ -487,7 +487,7 @@ handle_t *ocfs2_start_walk_page_trans(struct inode *inode,
 	}
 out:
 	if (ret) {
-		if (handle)
+		if (!IS_ERR(handle))
 			ocfs2_commit_trans(osb, handle);
 		handle = ERR_PTR(ret);
 	}
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Ocfs2-devel] [PATCH 58/62] ocfs2: Improve rename locking
  2008-04-02 20:15                                                                                                                 ` [Ocfs2-devel] [PATCH 57/62] fs/ocfs2/aops.c: test for IS_ERR rather than 0 Mark Fasheh
@ 2008-04-02 20:15                                                                                                                   ` Mark Fasheh
  0 siblings, 0 replies; 59+ messages in thread
From: Mark Fasheh @ 2008-04-02 20:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: ocfs2-devel, Joel Becker, Jan Kara

From: Jan Kara <jack@suse.cz>

ocfs2_rename() was being too aggressive with the rename lock - we only need
it for certain forms of directory rename.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/namei.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index ab5a227..d5d808f 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -997,7 +997,7 @@ static int ocfs2_rename(struct inode *old_dir,
 	 *
 	 * And that's why, just like the VFS, we need a file system
 	 * rename lock. */
-	if (old_dentry != new_dentry) {
+	if (old_dir != new_dir && S_ISDIR(old_inode->i_mode)) {
 		status = ocfs2_rename_lock(osb);
 		if (status < 0) {
 			mlog_errno(status);
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2008-04-02 20:15 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-02 20:14 [Ocfs2-devel] [PATCH 0/62] Ocfs2 updates for 2.6.26-rc1 Mark Fasheh
2008-04-02 20:14 ` [Ocfs2-devel] [PATCH 01/62] ocfs2: Move slot map access into slot_map.c Mark Fasheh
2008-04-02 20:14   ` [Ocfs2-devel] [PATCH 02/62] ocfs2: Make ocfs2_slot_info private Mark Fasheh
2008-04-02 20:14     ` [Ocfs2-devel] [PATCH 03/62] ocfs2: Change the recovery map to an array of node numbers Mark Fasheh
2008-04-02 20:14       ` [Ocfs2-devel] [PATCH 04/62] ocfs2: slot_map I/O based on max_slots Mark Fasheh
2008-04-02 20:14         ` [Ocfs2-devel] [PATCH 05/62] ocfs2: De-magic the in-memory slot map Mark Fasheh
2008-04-02 20:14           ` [Ocfs2-devel] [PATCH 06/62] ocfs2: Define the contents of the slot_map file Mark Fasheh
2008-04-02 20:14             ` [Ocfs2-devel] [PATCH 07/62] ocfs2: New slot map format Mark Fasheh
2008-04-02 20:14               ` [Ocfs2-devel] [PATCH 08/62] ocfs2: Separate out dlm lock functions Mark Fasheh
2008-04-02 20:14                 ` [Ocfs2-devel] [PATCH 09/62] ocfs2: Use global DLM_ constants in generic code Mark Fasheh
2008-04-02 20:14                   ` [Ocfs2-devel] [PATCH 10/62] ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API Mark Fasheh
2008-04-02 20:14                     ` [Ocfs2-devel] [PATCH 11/62] ocfs2: Create the lock status block union Mark Fasheh
2008-04-02 20:14                       ` [Ocfs2-devel] [PATCH 12/62] ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API Mark Fasheh
2008-04-02 20:14                         ` [Ocfs2-devel] [PATCH 13/62] ocfs2: Abstract out node number queries Mark Fasheh
2008-04-02 20:14                           ` [Ocfs2-devel] [PATCH 14/62] ocfs2: Move o2hb functionality into the stack glue Mark Fasheh
2008-04-02 20:14                             ` [Ocfs2-devel] [PATCH 15/62] ocfs2: Fill node number during cluster stack init Mark Fasheh
2008-04-02 20:14                               ` [Ocfs2-devel] [PATCH 16/62] ocfs2: Remove CANCELGRANT from the view of dlmglue Mark Fasheh
2008-04-02 20:14                                 ` [Ocfs2-devel] [PATCH 17/62] ocfs2: handle async EAGAIN from NOQUEUE request Mark Fasheh
2008-04-02 20:14                                   ` [Ocfs2-devel] [PATCH 18/62] ocfs2: Abstract out a debugging function for underlying dlms Mark Fasheh
2008-04-02 20:14                                     ` [Ocfs2-devel] [PATCH 19/62] ocfs2: Clean up stackglue initialization Mark Fasheh
2008-04-02 20:14                                       ` [Ocfs2-devel] [PATCH 20/62] ocfs2: Split o2cb code from generic stack functions Mark Fasheh
2008-04-02 20:14                                         ` [Ocfs2-devel] [PATCH 21/62] ocfs2: Create ocfs2_stack_operations and split out the o2cb stack Mark Fasheh
2008-04-02 20:14                                           ` [Ocfs2-devel] [PATCH 22/62] ocfs2: Break out stackglue into modules Mark Fasheh
2008-04-02 20:14                                             ` [Ocfs2-devel] [PATCH 23/62] ocfs2: Create stack glue sysfs files Mark Fasheh
2008-04-02 20:14                                               ` [Ocfs2-devel] [PATCH 24/62] ocfs2: Add the USERSPACE_STACK incompat bit Mark Fasheh
2008-04-02 20:14                                                 ` [Ocfs2-devel] [PATCH 25/62] ocfs2: Add the 'cluster_stack' sysfs file Mark Fasheh
2008-04-02 20:14                                                   ` [Ocfs2-devel] [PATCH 26/62] ocfs2: Add the user stack module Mark Fasheh
2008-04-02 20:14                                                     ` [Ocfs2-devel] [PATCH 27/62] ocfs2: Add the ocfs2_control misc device Mark Fasheh
2008-04-02 20:14                                                       ` [Ocfs2-devel] [PATCH 28/62] ocfs2: Start the ocfs2_control handshake Mark Fasheh
2008-04-02 20:14                                                         ` [Ocfs2-devel] [PATCH 29/62] ocfs2: Introduce the DOWN message to ocfs2_control Mark Fasheh
2008-04-02 20:14                                                           ` [Ocfs2-devel] [PATCH 30/62] ocfs2: Add the local node id to the handshake Mark Fasheh
2008-04-02 20:14                                                             ` [Ocfs2-devel] [PATCH 31/62] ocfs2: Add the 'set version' message to the ocfs2_control device Mark Fasheh
2008-04-02 20:14                                                               ` [Ocfs2-devel] [PATCH 32/62] ocfs2: add fsdlm to stackglue Mark Fasheh
2008-04-02 20:14                                                                 ` [Ocfs2-devel] [PATCH 33/62] ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h Mark Fasheh
2008-04-02 20:14                                                                   ` [Ocfs2-devel] [PATCH 34/62] ocfs2: Add kbuild for ocfs2_stack_user.ko Mark Fasheh
2008-04-02 20:14                                                                     ` [Ocfs2-devel] [PATCH 35/62] ocfs2: Allow selection of cluster plug-ins Mark Fasheh
2008-04-02 20:14                                                                       ` [Ocfs2-devel] [PATCH 36/62] ocfs2: Document /sys/fs/ocfs2 Mark Fasheh
2008-04-02 20:14                                                                         ` [Ocfs2-devel] [PATCH 37/62] ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle Mark Fasheh
2008-04-02 20:14                                                                           ` [Ocfs2-devel] [PATCH 38/62] ocfs2/dlm: Create slabcaches for lock and lockres Mark Fasheh
2008-04-02 20:14                                                                             ` [Ocfs2-devel] [PATCH 39/62] ocfs2/dlm: Link all lockres' to a tracking list Mark Fasheh
2008-04-02 20:14                                                                               ` [Ocfs2-devel] [PATCH 40/62] ocfs2/dlm: Create debugfs dirs Mark Fasheh
2008-04-02 20:14                                                                                 ` [Ocfs2-devel] [PATCH 41/62] ocfs2/dlm: Dump the dlm state in a debugfs file Mark Fasheh
2008-04-02 20:14                                                                                   ` [Ocfs2-devel] [PATCH 42/62] ocfs2/dlm: Dumps the lockres' into " Mark Fasheh
2008-04-02 20:14                                                                                     ` [Ocfs2-devel] [PATCH 43/62] ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h Mark Fasheh
2008-04-02 20:14                                                                                       ` [Ocfs2-devel] [PATCH 44/62] ocfs2/dlm: Dumps the mles into a debugfs file Mark Fasheh
2008-04-02 20:14                                                                                         ` [Ocfs2-devel] [PATCH 45/62] ocfs2/dlm: Dumps the purgelist " Mark Fasheh
2008-04-02 20:14                                                                                           ` [Ocfs2-devel] [PATCH 46/62] ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c Mark Fasheh
2008-04-02 20:14                                                                                             ` [Ocfs2-devel] [PATCH 47/62] ocfs2/dlm: Fix lockname in lockres print function Mark Fasheh
2008-04-02 20:14                                                                                               ` [Ocfs2-devel] [PATCH 48/62] ocfs2/dlm: Cleanup lockres print Mark Fasheh
2008-04-02 20:14                                                                                                 ` [Ocfs2-devel] [PATCH 49/62] ocfs2: Reconnect after idle time out Mark Fasheh
2008-04-02 20:15                                                                                                   ` [Ocfs2-devel] [PATCH 50/62] sysfs: Allow removal of symlinks in the sysfs root Mark Fasheh
2008-04-02 20:15                                                                                                     ` [Ocfs2-devel] [PATCH 51/62] ocfs2: Move /sys/o2cb to /sys/fs/o2cb Mark Fasheh
2008-04-02 20:15                                                                                                       ` [Ocfs2-devel] [PATCH 52/62] ocfs2: Add support for cross extent block Mark Fasheh
2008-04-02 20:15                                                                                                         ` [Ocfs2-devel] [PATCH 53/62] ocfs2: Enable cross extent block merge Mark Fasheh
2008-04-02 20:15                                                                                                           ` [Ocfs2-devel] [PATCH 54/62] ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits Mark Fasheh
2008-04-02 20:15                                                                                                             ` [Ocfs2-devel] [PATCH 55/62] ocfs2: Add ac_alloc_slot in ocfs2_alloc_context Mark Fasheh
2008-04-02 20:15                                                                                                               ` [Ocfs2-devel] [PATCH 56/62] ocfs2: Add inode stealing for ocfs2_reserve_new_inode Mark Fasheh
2008-04-02 20:15                                                                                                                 ` [Ocfs2-devel] [PATCH 57/62] fs/ocfs2/aops.c: test for IS_ERR rather than 0 Mark Fasheh
2008-04-02 20:15                                                                                                                   ` [Ocfs2-devel] [PATCH 58/62] ocfs2: Improve rename locking Mark Fasheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).