public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/28] XFS: sync and reclaim rework
@ 2008-08-19 13:16 Dave Chinner
  2008-08-19 13:16 ` [PATCH 01/28] XFS: remove i_gen from incore inode Dave Chinner
                   ` (19 more replies)
  0 siblings, 20 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs


Multiple patch sets, all in one patch bomb against a current
git tree. This includes all outstanding patches I have previously
sent that are not committed plus a bunch more...

---

XFS: replace the mount inode list with radix tree traversals V4

The list of all inodes on a mount is superfluous. We can traverse
all inodes now by walking the per-AG inode radix trees without
needing a separate list. This enables us to remove a bunch of
complex list traversal code and remove another two pointers from
the xfs_inode.

Also, by replacing the sync traversal with an ascending inode
number traversal, we will issue better inode I/O patterns for
writeback triggered by xfssyncd or unmount.

Before we make this change, move all the relevant sync code
into it's own file in the linux-2.6/ directory. This aggregates
VFS specific sync interfacing in the one file and will allow
all the subsequent change history to be associated with this
file so it is easy to find in future.

Version 4:
o revert xfs_syncsub -> xfs_sync change in xfs_quiesce_fs and
  rediff patch series

---

XFS: clean up sync code

xfs_sync and xfs_syncsub are multiplexed interfaces that
shares relatively little code between callers. because it is
a multiplexed interface, it's hard to tell what is executed
in each context it is called.

Factor out the sync code and explicitly call the sync functions
needed rather than the multiplexed interfaces. Once this is
done, we can remove xfs_syncsub and xfs_sync altogether.

---

RFC: Combine Linux and XFS inodes V2

XFS currently has to deal with two separate inode lifecycles
which makes for complexity in inode lookups and reclaim. We
also have the problem of not always having a linux inode around
when it might be useful to have it.

To avoid these lifecycle problems, this series embedѕ the linux
inode inside the struct xfs_inode and changes the way we reference
to two inodes. We can no longer check for a null linux inode -
instead we have to check to see if it is valid or not by checking
either the linux inode or xfs inode state flags. While this means
that inodes waiting for reclaim use more memory, this is not the
commonn state for inodes and the will soon be completely freed so
the additional memeory use in this state is only a temporary issue.

This combining of the inodes simplifies the inode and reclaim logic,
making it possible to do reclaim via radix tree tags (an upcoming
patch series) and to be able to use RCU locking on the radix trees.
The fact that we don't have a simple mechanism to determine the
reclaim state of the inode makes RCU locking very complex, and this
complexity is removed by having a combined inode structure.

This patch series also changes the way XFS caches inodes. It no
longer uses the linux inode cache as the primary lookup cache -
instead we rely solely on the XFS inode caches. This avoids the
inode_lock in lookups that hit the cache - we should get much
better parallelism out of inode lookup than we currently do now.

The patch series also makes use of the slab 'init once' feature
for the XFS inodes. This means we only need to do partial
initialisation of the xfs (and embedded linux inode) whenever
we allocate a new inode.

In future, we should also be able to cull duplicate fields out of
the xfs and linux inodes reducing the overall memory usage of
the active inode cache. This provides scope for continuing to
reduce the memory footprint of the XFS inode cache.

Version 2
o reorder and rework as a result of review comments.

---

XFS: Track reclaimable inodes in inode cache.

Move the tracking of reclaimable inodes
into the inode radix trees. This currently does not replace
the reclaim flags in the inode, rather it allows traversal of
all reclaimable inodes by walking the per-AG inode radix trees without needing
a separate list. This enables us to remove a list and a lock to
remove a point of serialisation during inode reclaim.


Like the matching sync code, this also allows reclaim of inodes
in ascending inode numbers which substantially improves I/O
patterns during reclaim driven inode flushing.

---

Combined diffstat:

 fs/inode.c                     |  205 ++++++----
 fs/xfs/Makefile                |    1
 fs/xfs/linux-2.6/xfs_aops.c    |    2
 fs/xfs/linux-2.6/xfs_iops.c    |   19
 fs/xfs/linux-2.6/xfs_super.c   |  265 +++----------
 fs/xfs/linux-2.6/xfs_super.h   |    3
 fs/xfs/linux-2.6/xfs_sync.c    |  780 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/linux-2.6/xfs_sync.h    |   55 ++
 fs/xfs/linux-2.6/xfs_vfs.h     |   31 -
 fs/xfs/linux-2.6/xfs_vnode.c   |    6
 fs/xfs/linux-2.6/xfs_vnode.h   |    5
 fs/xfs/quota/xfs_qm.c          |   10
 fs/xfs/quota/xfs_qm_syscalls.c |  137 +++----
 fs/xfs/xfs_ag.h                |    5
 fs/xfs/xfs_iget.c              |  473 +++++++++---------------
 fs/xfs/xfs_inode.c             |  140 ++++---
 fs/xfs/xfs_inode.h             |   22 -
 fs/xfs/xfs_itable.c            |   14
 fs/xfs/xfs_mount.c             |    8
 fs/xfs/xfs_mount.h             |   12
 fs/xfs/xfs_vfsops.c            |  617 --------------------------------
 fs/xfs/xfs_vfsops.h            |    2
 fs/xfs/xfs_vnodeops.c          |  118 ------
 include/linux/fs.h             |    2
 24 files changed, 1391 insertions(+), 1541 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/28] XFS: remove i_gen from incore inode
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 02/28] XFS: move sync code to its own file Dave Chinner
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

i_gen is incremented in directory operations when the
directory is changed. It is never read or otherwise used
so it should be removed to help reduce the size of the
struct xfs_inode.

The patch also removes a duplicate logging of the directory
inode core. We only need to do this once per transaction
so kill the one associated with the i_gen increment.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_inode.h    |    1 -
 fs/xfs/xfs_rename.c   |   12 ++----------
 fs/xfs/xfs_vnodeops.c |   29 ++---------------------------
 3 files changed, 4 insertions(+), 38 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 1420c49..9559e0b 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -230,7 +230,6 @@ typedef struct xfs_inode {
 	unsigned short		i_flags;	/* see defined flags below */
 	unsigned char		i_update_core;	/* timestamps/size is dirty */
 	unsigned char		i_update_size;	/* di_size field is dirty */
-	unsigned int		i_gen;		/* generation count */
 	unsigned int		i_delayed_blks;	/* count of delay alloc blks */
 
 	xfs_icdinode_t		i_d;		/* most of ondisk inode */
diff --git a/fs/xfs/xfs_rename.c b/fs/xfs/xfs_rename.c
index d700dac..02f0e8f 100644
--- a/fs/xfs/xfs_rename.c
+++ b/fs/xfs/xfs_rename.c
@@ -367,19 +367,11 @@ xfs_rename(
 					&first_block, &free_list, spaceres);
 	if (error)
 		goto abort_return;
-	xfs_ichgtime(src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 
-	/*
-	 * Update the generation counts on all the directory inodes
-	 * that we're modifying.
-	 */
-	src_dp->i_gen++;
+	xfs_ichgtime(src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
-
-	if (new_parent) {
-		target_dp->i_gen++;
+	if (new_parent)
 		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
-	}
 
 	/*
 	 * If this is a synchronous mount, make sure that the
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 588bb4a..0a5999f 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -1625,8 +1625,6 @@ xfs_create(
 		xfs_trans_set_sync(tp);
 	}
 
-	dp->i_gen++;
-
 	/*
 	 * Attach the dquot(s) to the inodes and modify them incore.
 	 * These ids of the inode couldn't have changed since the new
@@ -1985,13 +1983,6 @@ xfs_remove(
 	}
 	xfs_ichgtime(dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 
-	/*
-	 * Bump the in memory generation count on the parent
-	 * directory so that other can know that it has changed.
-	 */
-	dp->i_gen++;
-	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
-
 	if (is_dir) {
 		/*
 		 * Drop the link from ip's "..".
@@ -2009,8 +2000,8 @@ xfs_remove(
 	} else {
 		/*
 		 * When removing a non-directory we need to log the parent
-		 * inode here for the i_gen update.  For a directory this is
-		 * done implicitly by the xfs_droplink call for the ".." entry.
+		 * inode here.  For a directory this is done implicitly
+		 * by the xfs_droplink call for the ".." entry.
 		 */
 		xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
 	}
@@ -2170,7 +2161,6 @@ xfs_link(
 	if (error)
 		goto abort_return;
 	xfs_ichgtime(tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-	tdp->i_gen++;
 	xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
 
 	error = xfs_bumplink(tp, sip);
@@ -2347,18 +2337,10 @@ xfs_mkdir(
 	}
 	xfs_ichgtime(dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 
-	/*
-	 * Bump the in memory version number of the parent directory
-	 * so that other processes accessing it will recognize that
-	 * the directory has changed.
-	 */
-	dp->i_gen++;
-
 	error = xfs_dir_init(tp, cdp, dp);
 	if (error)
 		goto error2;
 
-	cdp->i_gen = 1;
 	error = xfs_bumplink(tp, dp);
 	if (error)
 		goto error2;
@@ -2645,13 +2627,6 @@ xfs_symlink(
 	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
 
 	/*
-	 * Bump the in memory version number of the parent directory
-	 * so that other processes accessing it will recognize that
-	 * the directory has changed.
-	 */
-	dp->i_gen++;
-
-	/*
 	 * If this is a synchronous mount, make sure that the
 	 * symlink transaction goes to disk before returning to
 	 * the user.
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/28] XFS: move sync code to its own file
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
  2008-08-19 13:16 ` [PATCH 01/28] XFS: remove i_gen from incore inode Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 04/28] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V3 Dave Chinner
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

The sync code in XFS is spread around several files.
While it used to make sense to have such a distribution,
the code is about to be cleaned up and so centralising it
in one spot as the first step makes sense.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/Makefile              |    1 +
 fs/xfs/linux-2.6/xfs_super.c |    1 +
 fs/xfs/linux-2.6/xfs_sync.c  |  605 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/linux-2.6/xfs_sync.h  |    7 +
 fs/xfs/xfs_vfsops.c          |  560 +--------------------------------------
 fs/xfs/xfs_vfsops.h          |    1 -
 6 files changed, 615 insertions(+), 560 deletions(-)
 create mode 100644 fs/xfs/linux-2.6/xfs_sync.c
 create mode 100644 fs/xfs/linux-2.6/xfs_sync.h

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 737c9a4..f42ea60 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -106,6 +106,7 @@ xfs-y				+= $(addprefix $(XFS_LINUX)/, \
 				   xfs_iops.o \
 				   xfs_lrw.o \
 				   xfs_super.o \
+				   xfs_sync.o \
 				   xfs_vnode.o \
 				   xfs_xattr.o)
 
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 7d4bdfd..aba1cf0 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -58,6 +58,7 @@
 #include "xfs_extfree_item.h"
 #include "xfs_mru_cache.h"
 #include "xfs_inode_item.h"
+#include "xfs_sync.h"
 
 #include <linux/namei.h>
 #include <linux/init.h>
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
new file mode 100644
index 0000000..c765eb2
--- /dev/null
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -0,0 +1,605 @@
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_dir2.h"
+#include "xfs_dmapi.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_btree.h"
+#include "xfs_dir2_sf.h"
+#include "xfs_attr_sf.h"
+#include "xfs_inode.h"
+#include "xfs_dinode.h"
+#include "xfs_error.h"
+#include "xfs_mru_cache.h"
+#include "xfs_filestream.h"
+#include "xfs_vnodeops.h"
+#include "xfs_utils.h"
+#include "xfs_buf_item.h"
+#include "xfs_inode_item.h"
+#include "xfs_rw.h"
+
+/*
+ * xfs_sync flushes any pending I/O to file system vfsp.
+ *
+ * This routine is called by vfs_sync() to make sure that things make it
+ * out to disk eventually, on sync() system calls to flush out everything,
+ * and when the file system is unmounted.  For the vfs_sync() case, all
+ * we really need to do is sync out the log to make all of our meta-data
+ * updates permanent (except for timestamps).  For calls from pflushd(),
+ * dirty pages are kept moving by calling pdflush() on the inodes
+ * containing them.  We also flush the inodes that we can lock without
+ * sleeping and the superblock if we can lock it without sleeping from
+ * vfs_sync() so that items at the tail of the log are always moving out.
+ *
+ * Flags:
+ *      SYNC_BDFLUSH - We're being called from vfs_sync() so we don't want
+ *		       to sleep if we can help it.  All we really need
+ *		       to do is ensure that the log is synced at least
+ *		       periodically.  We also push the inodes and
+ *		       superblock if we can lock them without sleeping
+ *			and they are not pinned.
+ *      SYNC_ATTR    - We need to flush the inodes.  If SYNC_BDFLUSH is not
+ *		       set, then we really want to lock each inode and flush
+ *		       it.
+ *      SYNC_WAIT    - All the flushes that take place in this call should
+ *		       be synchronous.
+ *      SYNC_DELWRI  - This tells us to push dirty pages associated with
+ *		       inodes.  SYNC_WAIT and SYNC_BDFLUSH are used to
+ *		       determine if they should be flushed sync, async, or
+ *		       delwri.
+ *      SYNC_CLOSE   - This flag is passed when the system is being
+ *		       unmounted.  We should sync and invalidate everything.
+ *      SYNC_FSDATA  - This indicates that the caller would like to make
+ *		       sure the superblock is safe on disk.  We can ensure
+ *		       this by simply making sure the log gets flushed
+ *		       if SYNC_BDFLUSH is set, and by actually writing it
+ *		       out otherwise.
+ *	SYNC_IOWAIT  - The caller wants us to wait for all data I/O to complete
+ *		       before we return (including direct I/O). Forms the drain
+ *		       side of the write barrier needed to safely quiesce the
+ *		       filesystem.
+ *
+ */
+int
+xfs_sync(
+	xfs_mount_t	*mp,
+	int		flags)
+{
+	int		error;
+
+	/*
+	 * Get the Quota Manager to flush the dquots.
+	 *
+	 * If XFS quota support is not enabled or this filesystem
+	 * instance does not use quotas XFS_QM_DQSYNC will always
+	 * return zero.
+	 */
+	error = XFS_QM_DQSYNC(mp, flags);
+	if (error) {
+		/*
+		 * If we got an IO error, we will be shutting down.
+		 * So, there's nothing more for us to do here.
+		 */
+		ASSERT(error != EIO || XFS_FORCED_SHUTDOWN(mp));
+		if (XFS_FORCED_SHUTDOWN(mp))
+			return XFS_ERROR(error);
+	}
+
+	if (flags & SYNC_IOWAIT)
+		xfs_filestream_flush(mp);
+
+	return xfs_syncsub(mp, flags, NULL);
+}
+
+/*
+ * xfs sync routine for internal use
+ *
+ * This routine supports all of the flags defined for the generic vfs_sync
+ * interface as explained above under xfs_sync.
+ *
+ */
+int
+xfs_sync_inodes(
+	xfs_mount_t	*mp,
+	int		flags,
+	int             *bypassed)
+{
+	xfs_inode_t	*ip = NULL;
+	struct inode	*vp = NULL;
+	int		error;
+	int		last_error;
+	uint64_t	fflag;
+	uint		lock_flags;
+	uint		base_lock_flags;
+	boolean_t	mount_locked;
+	boolean_t	vnode_refed;
+	int		preempt;
+	xfs_iptr_t	*ipointer;
+#ifdef DEBUG
+	boolean_t	ipointer_in = B_FALSE;
+
+#define IPOINTER_SET	ipointer_in = B_TRUE
+#define IPOINTER_CLR	ipointer_in = B_FALSE
+#else
+#define IPOINTER_SET
+#define IPOINTER_CLR
+#endif
+
+
+/* Insert a marker record into the inode list after inode ip. The list
+ * must be locked when this is called. After the call the list will no
+ * longer be locked.
+ */
+#define IPOINTER_INSERT(ip, mp)	{ \
+		ASSERT(ipointer_in == B_FALSE); \
+		ipointer->ip_mnext = ip->i_mnext; \
+		ipointer->ip_mprev = ip; \
+		ip->i_mnext = (xfs_inode_t *)ipointer; \
+		ipointer->ip_mnext->i_mprev = (xfs_inode_t *)ipointer; \
+		preempt = 0; \
+		XFS_MOUNT_IUNLOCK(mp); \
+		mount_locked = B_FALSE; \
+		IPOINTER_SET; \
+	}
+
+/* Remove the marker from the inode list. If the marker was the only item
+ * in the list then there are no remaining inodes and we should zero out
+ * the whole list. If we are the current head of the list then move the head
+ * past us.
+ */
+#define IPOINTER_REMOVE(ip, mp)	{ \
+		ASSERT(ipointer_in == B_TRUE); \
+		if (ipointer->ip_mnext != (xfs_inode_t *)ipointer) { \
+			ip = ipointer->ip_mnext; \
+			ip->i_mprev = ipointer->ip_mprev; \
+			ipointer->ip_mprev->i_mnext = ip; \
+			if (mp->m_inodes == (xfs_inode_t *)ipointer) { \
+				mp->m_inodes = ip; \
+			} \
+		} else { \
+			ASSERT(mp->m_inodes == (xfs_inode_t *)ipointer); \
+			mp->m_inodes = NULL; \
+			ip = NULL; \
+		} \
+		IPOINTER_CLR; \
+	}
+
+#define XFS_PREEMPT_MASK	0x7f
+
+	ASSERT(!(flags & SYNC_BDFLUSH));
+
+	if (bypassed)
+		*bypassed = 0;
+	if (mp->m_flags & XFS_MOUNT_RDONLY)
+		return 0;
+	error = 0;
+	last_error = 0;
+	preempt = 0;
+
+	/* Allocate a reference marker */
+	ipointer = (xfs_iptr_t *)kmem_zalloc(sizeof(xfs_iptr_t), KM_SLEEP);
+
+	fflag = XFS_B_ASYNC;		/* default is don't wait */
+	if (flags & SYNC_DELWRI)
+		fflag = XFS_B_DELWRI;
+	if (flags & SYNC_WAIT)
+		fflag = 0;		/* synchronous overrides all */
+
+	base_lock_flags = XFS_ILOCK_SHARED;
+	if (flags & (SYNC_DELWRI | SYNC_CLOSE)) {
+		/*
+		 * We need the I/O lock if we're going to call any of
+		 * the flush/inval routines.
+		 */
+		base_lock_flags |= XFS_IOLOCK_SHARED;
+	}
+
+	XFS_MOUNT_ILOCK(mp);
+
+	ip = mp->m_inodes;
+
+	mount_locked = B_TRUE;
+	vnode_refed  = B_FALSE;
+
+	IPOINTER_CLR;
+
+	do {
+		ASSERT(ipointer_in == B_FALSE);
+		ASSERT(vnode_refed == B_FALSE);
+
+		lock_flags = base_lock_flags;
+
+		/*
+		 * There were no inodes in the list, just break out
+		 * of the loop.
+		 */
+		if (ip == NULL) {
+			break;
+		}
+
+		/*
+		 * We found another sync thread marker - skip it
+		 */
+		if (ip->i_mount == NULL) {
+			ip = ip->i_mnext;
+			continue;
+		}
+
+		vp = VFS_I(ip);
+
+		/*
+		 * If the vnode is gone then this is being torn down,
+		 * call reclaim if it is flushed, else let regular flush
+		 * code deal with it later in the loop.
+		 */
+
+		if (vp == NULL) {
+			/* Skip ones already in reclaim */
+			if (ip->i_flags & XFS_IRECLAIM) {
+				ip = ip->i_mnext;
+				continue;
+			}
+			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0) {
+				ip = ip->i_mnext;
+			} else if ((xfs_ipincount(ip) == 0) &&
+				    xfs_iflock_nowait(ip)) {
+				IPOINTER_INSERT(ip, mp);
+
+				xfs_finish_reclaim(ip, 1,
+						XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+
+				XFS_MOUNT_ILOCK(mp);
+				mount_locked = B_TRUE;
+				IPOINTER_REMOVE(ip, mp);
+			} else {
+				xfs_iunlock(ip, XFS_ILOCK_EXCL);
+				ip = ip->i_mnext;
+			}
+			continue;
+		}
+
+		if (VN_BAD(vp)) {
+			ip = ip->i_mnext;
+			continue;
+		}
+
+		if (XFS_FORCED_SHUTDOWN(mp) && !(flags & SYNC_CLOSE)) {
+			XFS_MOUNT_IUNLOCK(mp);
+			kmem_free(ipointer);
+			return 0;
+		}
+
+		/*
+		 * Try to lock without sleeping.  We're out of order with
+		 * the inode list lock here, so if we fail we need to drop
+		 * the mount lock and try again.  If we're called from
+		 * bdflush() here, then don't bother.
+		 *
+		 * The inode lock here actually coordinates with the
+		 * almost spurious inode lock in xfs_ireclaim() to prevent
+		 * the vnode we handle here without a reference from
+		 * being freed while we reference it.  If we lock the inode
+		 * while it's on the mount list here, then the spurious inode
+		 * lock in xfs_ireclaim() after the inode is pulled from
+		 * the mount list will sleep until we release it here.
+		 * This keeps the vnode from being freed while we reference
+		 * it.
+		 */
+		if (xfs_ilock_nowait(ip, lock_flags) == 0) {
+			if (vp == NULL) {
+				ip = ip->i_mnext;
+				continue;
+			}
+
+			vp = vn_grab(vp);
+			if (vp == NULL) {
+				ip = ip->i_mnext;
+				continue;
+			}
+
+			IPOINTER_INSERT(ip, mp);
+			xfs_ilock(ip, lock_flags);
+
+			ASSERT(vp == VFS_I(ip));
+			ASSERT(ip->i_mount == mp);
+
+			vnode_refed = B_TRUE;
+		}
+
+		/* From here on in the loop we may have a marker record
+		 * in the inode list.
+		 */
+
+		/*
+		 * If we have to flush data or wait for I/O completion
+		 * we need to drop the ilock that we currently hold.
+		 * If we need to drop the lock, insert a marker if we
+		 * have not already done so.
+		 */
+		if ((flags & (SYNC_CLOSE|SYNC_IOWAIT)) ||
+		    ((flags & SYNC_DELWRI) && VN_DIRTY(vp))) {
+			if (mount_locked) {
+				IPOINTER_INSERT(ip, mp);
+			}
+			xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+			if (flags & SYNC_CLOSE) {
+				/* Shutdown case. Flush and invalidate. */
+				if (XFS_FORCED_SHUTDOWN(mp))
+					xfs_tosspages(ip, 0, -1,
+							     FI_REMAPF);
+				else
+					error = xfs_flushinval_pages(ip,
+							0, -1, FI_REMAPF);
+			} else if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
+				error = xfs_flush_pages(ip, 0,
+							-1, fflag, FI_NONE);
+			}
+
+			/*
+			 * When freezing, we need to wait ensure all I/O (including direct
+			 * I/O) is complete to ensure no further data modification can take
+			 * place after this point
+			 */
+			if (flags & SYNC_IOWAIT)
+				vn_iowait(ip);
+
+			xfs_ilock(ip, XFS_ILOCK_SHARED);
+		}
+
+		if ((flags & SYNC_ATTR) &&
+		    (ip->i_update_core ||
+		     (ip->i_itemp && ip->i_itemp->ili_format.ilf_fields))) {
+			if (mount_locked)
+				IPOINTER_INSERT(ip, mp);
+
+			if (flags & SYNC_WAIT) {
+				xfs_iflock(ip);
+				error = xfs_iflush(ip, XFS_IFLUSH_SYNC);
+
+			/*
+			 * If we can't acquire the flush lock, then the inode
+			 * is already being flushed so don't bother waiting.
+			 *
+			 * If we can lock it then do a delwri flush so we can
+			 * combine multiple inode flushes in each disk write.
+			 */
+			} else if (xfs_iflock_nowait(ip)) {
+				error = xfs_iflush(ip, XFS_IFLUSH_DELWRI);
+			} else if (bypassed) {
+				(*bypassed)++;
+			}
+		}
+
+		if (lock_flags != 0) {
+			xfs_iunlock(ip, lock_flags);
+		}
+
+		if (vnode_refed) {
+			/*
+			 * If we had to take a reference on the vnode
+			 * above, then wait until after we've unlocked
+			 * the inode to release the reference.  This is
+			 * because we can be already holding the inode
+			 * lock when IRELE() calls xfs_inactive().
+			 *
+			 * Make sure to drop the mount lock before calling
+			 * IRELE() so that we don't trip over ourselves if
+			 * we have to go for the mount lock again in the
+			 * inactive code.
+			 */
+			if (mount_locked) {
+				IPOINTER_INSERT(ip, mp);
+			}
+
+			IRELE(ip);
+
+			vnode_refed = B_FALSE;
+		}
+
+		if (error) {
+			last_error = error;
+		}
+
+		/*
+		 * bail out if the filesystem is corrupted.
+		 */
+		if (error == EFSCORRUPTED)  {
+			if (!mount_locked) {
+				XFS_MOUNT_ILOCK(mp);
+				IPOINTER_REMOVE(ip, mp);
+			}
+			XFS_MOUNT_IUNLOCK(mp);
+			ASSERT(ipointer_in == B_FALSE);
+			kmem_free(ipointer);
+			return XFS_ERROR(error);
+		}
+
+		/* Let other threads have a chance at the mount lock
+		 * if we have looped many times without dropping the
+		 * lock.
+		 */
+		if ((++preempt & XFS_PREEMPT_MASK) == 0) {
+			if (mount_locked) {
+				IPOINTER_INSERT(ip, mp);
+			}
+		}
+
+		if (mount_locked == B_FALSE) {
+			XFS_MOUNT_ILOCK(mp);
+			mount_locked = B_TRUE;
+			IPOINTER_REMOVE(ip, mp);
+			continue;
+		}
+
+		ASSERT(ipointer_in == B_FALSE);
+		ip = ip->i_mnext;
+
+	} while (ip != mp->m_inodes);
+
+	XFS_MOUNT_IUNLOCK(mp);
+
+	ASSERT(ipointer_in == B_FALSE);
+
+	kmem_free(ipointer);
+	return XFS_ERROR(last_error);
+}
+
+/*
+ * xfs sync routine for internal use
+ *
+ * This routine supports all of the flags defined for the generic vfs_sync
+ * interface as explained above under xfs_sync.
+ *
+ */
+int
+xfs_syncsub(
+	xfs_mount_t	*mp,
+	int		flags,
+	int             *bypassed)
+{
+	int		error = 0;
+	int		last_error = 0;
+	uint		log_flags = XFS_LOG_FORCE;
+	xfs_buf_t	*bp;
+	xfs_buf_log_item_t	*bip;
+
+	/*
+	 * Sync out the log.  This ensures that the log is periodically
+	 * flushed even if there is not enough activity to fill it up.
+	 */
+	if (flags & SYNC_WAIT)
+		log_flags |= XFS_LOG_SYNC;
+
+	xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
+
+	if (flags & (SYNC_ATTR|SYNC_DELWRI)) {
+		if (flags & SYNC_BDFLUSH)
+			xfs_finish_reclaim_all(mp, 1);
+		else
+			error = xfs_sync_inodes(mp, flags, bypassed);
+	}
+
+	/*
+	 * Flushing out dirty data above probably generated more
+	 * log activity, so if this isn't vfs_sync() then flush
+	 * the log again.
+	 */
+	if (flags & SYNC_DELWRI) {
+		xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
+	}
+
+	if (flags & SYNC_FSDATA) {
+		/*
+		 * If this is vfs_sync() then only sync the superblock
+		 * if we can lock it without sleeping and it is not pinned.
+		 */
+		if (flags & SYNC_BDFLUSH) {
+			bp = xfs_getsb(mp, XFS_BUF_TRYLOCK);
+			if (bp != NULL) {
+				bip = XFS_BUF_FSPRIVATE(bp,xfs_buf_log_item_t*);
+				if ((bip != NULL) &&
+				    xfs_buf_item_dirty(bip)) {
+					if (!(XFS_BUF_ISPINNED(bp))) {
+						XFS_BUF_ASYNC(bp);
+						error = xfs_bwrite(mp, bp);
+					} else {
+						xfs_buf_relse(bp);
+					}
+				} else {
+					xfs_buf_relse(bp);
+				}
+			}
+		} else {
+			bp = xfs_getsb(mp, 0);
+			/*
+			 * If the buffer is pinned then push on the log so
+			 * we won't get stuck waiting in the write for
+			 * someone, maybe ourselves, to flush the log.
+			 * Even though we just pushed the log above, we
+			 * did not have the superblock buffer locked at
+			 * that point so it can become pinned in between
+			 * there and here.
+			 */
+			if (XFS_BUF_ISPINNED(bp))
+				xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
+			if (flags & SYNC_WAIT)
+				XFS_BUF_UNASYNC(bp);
+			else
+				XFS_BUF_ASYNC(bp);
+			error = xfs_bwrite(mp, bp);
+		}
+		if (error) {
+			last_error = error;
+		}
+	}
+
+	/*
+	 * Now check to see if the log needs a "dummy" transaction.
+	 */
+	if (!(flags & SYNC_REMOUNT) && xfs_log_need_covered(mp)) {
+		xfs_trans_t *tp;
+		xfs_inode_t *ip;
+
+		/*
+		 * Put a dummy transaction in the log to tell
+		 * recovery that all others are OK.
+		 */
+		tp = xfs_trans_alloc(mp, XFS_TRANS_DUMMY1);
+		if ((error = xfs_trans_reserve(tp, 0,
+				XFS_ICHANGE_LOG_RES(mp),
+				0, 0, 0)))  {
+			xfs_trans_cancel(tp, 0);
+			return error;
+		}
+
+		ip = mp->m_rootip;
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+		xfs_trans_ihold(tp, ip);
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		error = xfs_trans_commit(tp, 0);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
+	}
+
+	/*
+	 * When shutting down, we need to insure that the AIL is pushed
+	 * to disk or the filesystem can appear corrupt from the PROM.
+	 */
+	if ((flags & (SYNC_CLOSE|SYNC_WAIT)) == (SYNC_CLOSE|SYNC_WAIT)) {
+		XFS_bflush(mp->m_ddev_targp);
+		if (mp->m_rtdev_targp) {
+			XFS_bflush(mp->m_rtdev_targp);
+		}
+	}
+
+	return XFS_ERROR(last_error);
+}
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
new file mode 100644
index 0000000..f4c3b1e
--- /dev/null
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -0,0 +1,7 @@
+#ifndef XFS_SYNC_H
+#define XFS_SYNC_H 1
+
+int xfs_sync(struct xfs_mount *mp, int flags);
+int xfs_syncsub(struct xfs_mount *mp, int flags, int *bypassed);
+
+#endif
diff --git a/fs/xfs/xfs_vfsops.c b/fs/xfs/xfs_vfsops.c
index 439dd39..01e274b 100644
--- a/fs/xfs/xfs_vfsops.c
+++ b/fs/xfs/xfs_vfsops.c
@@ -56,6 +56,7 @@
 #include "xfs_vnodeops.h"
 #include "xfs_vfsops.h"
 #include "xfs_utils.h"
+#include "xfs_sync.h"
 
 
 STATIC void
@@ -196,562 +197,3 @@ fscorrupt_out2:
 	return XFS_ERROR(EFSCORRUPTED);
 }
 
-/*
- * xfs_sync flushes any pending I/O to file system vfsp.
- *
- * This routine is called by vfs_sync() to make sure that things make it
- * out to disk eventually, on sync() system calls to flush out everything,
- * and when the file system is unmounted.  For the vfs_sync() case, all
- * we really need to do is sync out the log to make all of our meta-data
- * updates permanent (except for timestamps).  For calls from pflushd(),
- * dirty pages are kept moving by calling pdflush() on the inodes
- * containing them.  We also flush the inodes that we can lock without
- * sleeping and the superblock if we can lock it without sleeping from
- * vfs_sync() so that items at the tail of the log are always moving out.
- *
- * Flags:
- *      SYNC_BDFLUSH - We're being called from vfs_sync() so we don't want
- *		       to sleep if we can help it.  All we really need
- *		       to do is ensure that the log is synced at least
- *		       periodically.  We also push the inodes and
- *		       superblock if we can lock them without sleeping
- *			and they are not pinned.
- *      SYNC_ATTR    - We need to flush the inodes.  If SYNC_BDFLUSH is not
- *		       set, then we really want to lock each inode and flush
- *		       it.
- *      SYNC_WAIT    - All the flushes that take place in this call should
- *		       be synchronous.
- *      SYNC_DELWRI  - This tells us to push dirty pages associated with
- *		       inodes.  SYNC_WAIT and SYNC_BDFLUSH are used to
- *		       determine if they should be flushed sync, async, or
- *		       delwri.
- *      SYNC_CLOSE   - This flag is passed when the system is being
- *		       unmounted.  We should sync and invalidate everything.
- *      SYNC_FSDATA  - This indicates that the caller would like to make
- *		       sure the superblock is safe on disk.  We can ensure
- *		       this by simply making sure the log gets flushed
- *		       if SYNC_BDFLUSH is set, and by actually writing it
- *		       out otherwise.
- *	SYNC_IOWAIT  - The caller wants us to wait for all data I/O to complete
- *		       before we return (including direct I/O). Forms the drain
- *		       side of the write barrier needed to safely quiesce the
- *		       filesystem.
- *
- */
-int
-xfs_sync(
-	xfs_mount_t	*mp,
-	int		flags)
-{
-	int		error;
-
-	/*
-	 * Get the Quota Manager to flush the dquots.
-	 *
-	 * If XFS quota support is not enabled or this filesystem
-	 * instance does not use quotas XFS_QM_DQSYNC will always
-	 * return zero.
-	 */
-	error = XFS_QM_DQSYNC(mp, flags);
-	if (error) {
-		/*
-		 * If we got an IO error, we will be shutting down.
-		 * So, there's nothing more for us to do here.
-		 */
-		ASSERT(error != EIO || XFS_FORCED_SHUTDOWN(mp));
-		if (XFS_FORCED_SHUTDOWN(mp))
-			return XFS_ERROR(error);
-	}
-
-	if (flags & SYNC_IOWAIT)
-		xfs_filestream_flush(mp);
-
-	return xfs_syncsub(mp, flags, NULL);
-}
-
-/*
- * xfs sync routine for internal use
- *
- * This routine supports all of the flags defined for the generic vfs_sync
- * interface as explained above under xfs_sync.
- *
- */
-int
-xfs_sync_inodes(
-	xfs_mount_t	*mp,
-	int		flags,
-	int             *bypassed)
-{
-	xfs_inode_t	*ip = NULL;
-	struct inode	*vp = NULL;
-	int		error;
-	int		last_error;
-	uint64_t	fflag;
-	uint		lock_flags;
-	uint		base_lock_flags;
-	boolean_t	mount_locked;
-	boolean_t	vnode_refed;
-	int		preempt;
-	xfs_iptr_t	*ipointer;
-#ifdef DEBUG
-	boolean_t	ipointer_in = B_FALSE;
-
-#define IPOINTER_SET	ipointer_in = B_TRUE
-#define IPOINTER_CLR	ipointer_in = B_FALSE
-#else
-#define IPOINTER_SET
-#define IPOINTER_CLR
-#endif
-
-
-/* Insert a marker record into the inode list after inode ip. The list
- * must be locked when this is called. After the call the list will no
- * longer be locked.
- */
-#define IPOINTER_INSERT(ip, mp)	{ \
-		ASSERT(ipointer_in == B_FALSE); \
-		ipointer->ip_mnext = ip->i_mnext; \
-		ipointer->ip_mprev = ip; \
-		ip->i_mnext = (xfs_inode_t *)ipointer; \
-		ipointer->ip_mnext->i_mprev = (xfs_inode_t *)ipointer; \
-		preempt = 0; \
-		XFS_MOUNT_IUNLOCK(mp); \
-		mount_locked = B_FALSE; \
-		IPOINTER_SET; \
-	}
-
-/* Remove the marker from the inode list. If the marker was the only item
- * in the list then there are no remaining inodes and we should zero out
- * the whole list. If we are the current head of the list then move the head
- * past us.
- */
-#define IPOINTER_REMOVE(ip, mp)	{ \
-		ASSERT(ipointer_in == B_TRUE); \
-		if (ipointer->ip_mnext != (xfs_inode_t *)ipointer) { \
-			ip = ipointer->ip_mnext; \
-			ip->i_mprev = ipointer->ip_mprev; \
-			ipointer->ip_mprev->i_mnext = ip; \
-			if (mp->m_inodes == (xfs_inode_t *)ipointer) { \
-				mp->m_inodes = ip; \
-			} \
-		} else { \
-			ASSERT(mp->m_inodes == (xfs_inode_t *)ipointer); \
-			mp->m_inodes = NULL; \
-			ip = NULL; \
-		} \
-		IPOINTER_CLR; \
-	}
-
-#define XFS_PREEMPT_MASK	0x7f
-
-	ASSERT(!(flags & SYNC_BDFLUSH));
-
-	if (bypassed)
-		*bypassed = 0;
-	if (mp->m_flags & XFS_MOUNT_RDONLY)
-		return 0;
-	error = 0;
-	last_error = 0;
-	preempt = 0;
-
-	/* Allocate a reference marker */
-	ipointer = (xfs_iptr_t *)kmem_zalloc(sizeof(xfs_iptr_t), KM_SLEEP);
-
-	fflag = XFS_B_ASYNC;		/* default is don't wait */
-	if (flags & SYNC_DELWRI)
-		fflag = XFS_B_DELWRI;
-	if (flags & SYNC_WAIT)
-		fflag = 0;		/* synchronous overrides all */
-
-	base_lock_flags = XFS_ILOCK_SHARED;
-	if (flags & (SYNC_DELWRI | SYNC_CLOSE)) {
-		/*
-		 * We need the I/O lock if we're going to call any of
-		 * the flush/inval routines.
-		 */
-		base_lock_flags |= XFS_IOLOCK_SHARED;
-	}
-
-	XFS_MOUNT_ILOCK(mp);
-
-	ip = mp->m_inodes;
-
-	mount_locked = B_TRUE;
-	vnode_refed  = B_FALSE;
-
-	IPOINTER_CLR;
-
-	do {
-		ASSERT(ipointer_in == B_FALSE);
-		ASSERT(vnode_refed == B_FALSE);
-
-		lock_flags = base_lock_flags;
-
-		/*
-		 * There were no inodes in the list, just break out
-		 * of the loop.
-		 */
-		if (ip == NULL) {
-			break;
-		}
-
-		/*
-		 * We found another sync thread marker - skip it
-		 */
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
-			continue;
-		}
-
-		vp = VFS_I(ip);
-
-		/*
-		 * If the vnode is gone then this is being torn down,
-		 * call reclaim if it is flushed, else let regular flush
-		 * code deal with it later in the loop.
-		 */
-
-		if (vp == NULL) {
-			/* Skip ones already in reclaim */
-			if (ip->i_flags & XFS_IRECLAIM) {
-				ip = ip->i_mnext;
-				continue;
-			}
-			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0) {
-				ip = ip->i_mnext;
-			} else if ((xfs_ipincount(ip) == 0) &&
-				    xfs_iflock_nowait(ip)) {
-				IPOINTER_INSERT(ip, mp);
-
-				xfs_finish_reclaim(ip, 1,
-						XFS_IFLUSH_DELWRI_ELSE_ASYNC);
-
-				XFS_MOUNT_ILOCK(mp);
-				mount_locked = B_TRUE;
-				IPOINTER_REMOVE(ip, mp);
-			} else {
-				xfs_iunlock(ip, XFS_ILOCK_EXCL);
-				ip = ip->i_mnext;
-			}
-			continue;
-		}
-
-		if (VN_BAD(vp)) {
-			ip = ip->i_mnext;
-			continue;
-		}
-
-		if (XFS_FORCED_SHUTDOWN(mp) && !(flags & SYNC_CLOSE)) {
-			XFS_MOUNT_IUNLOCK(mp);
-			kmem_free(ipointer);
-			return 0;
-		}
-
-		/*
-		 * Try to lock without sleeping.  We're out of order with
-		 * the inode list lock here, so if we fail we need to drop
-		 * the mount lock and try again.  If we're called from
-		 * bdflush() here, then don't bother.
-		 *
-		 * The inode lock here actually coordinates with the
-		 * almost spurious inode lock in xfs_ireclaim() to prevent
-		 * the vnode we handle here without a reference from
-		 * being freed while we reference it.  If we lock the inode
-		 * while it's on the mount list here, then the spurious inode
-		 * lock in xfs_ireclaim() after the inode is pulled from
-		 * the mount list will sleep until we release it here.
-		 * This keeps the vnode from being freed while we reference
-		 * it.
-		 */
-		if (xfs_ilock_nowait(ip, lock_flags) == 0) {
-			if (vp == NULL) {
-				ip = ip->i_mnext;
-				continue;
-			}
-
-			vp = vn_grab(vp);
-			if (vp == NULL) {
-				ip = ip->i_mnext;
-				continue;
-			}
-
-			IPOINTER_INSERT(ip, mp);
-			xfs_ilock(ip, lock_flags);
-
-			ASSERT(vp == VFS_I(ip));
-			ASSERT(ip->i_mount == mp);
-
-			vnode_refed = B_TRUE;
-		}
-
-		/* From here on in the loop we may have a marker record
-		 * in the inode list.
-		 */
-
-		/*
-		 * If we have to flush data or wait for I/O completion
-		 * we need to drop the ilock that we currently hold.
-		 * If we need to drop the lock, insert a marker if we
-		 * have not already done so.
-		 */
-		if ((flags & (SYNC_CLOSE|SYNC_IOWAIT)) ||
-		    ((flags & SYNC_DELWRI) && VN_DIRTY(vp))) {
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
-			xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
-			if (flags & SYNC_CLOSE) {
-				/* Shutdown case. Flush and invalidate. */
-				if (XFS_FORCED_SHUTDOWN(mp))
-					xfs_tosspages(ip, 0, -1,
-							     FI_REMAPF);
-				else
-					error = xfs_flushinval_pages(ip,
-							0, -1, FI_REMAPF);
-			} else if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
-				error = xfs_flush_pages(ip, 0,
-							-1, fflag, FI_NONE);
-			}
-
-			/*
-			 * When freezing, we need to wait ensure all I/O (including direct
-			 * I/O) is complete to ensure no further data modification can take
-			 * place after this point
-			 */
-			if (flags & SYNC_IOWAIT)
-				vn_iowait(ip);
-
-			xfs_ilock(ip, XFS_ILOCK_SHARED);
-		}
-
-		if ((flags & SYNC_ATTR) &&
-		    (ip->i_update_core ||
-		     (ip->i_itemp && ip->i_itemp->ili_format.ilf_fields))) {
-			if (mount_locked)
-				IPOINTER_INSERT(ip, mp);
-
-			if (flags & SYNC_WAIT) {
-				xfs_iflock(ip);
-				error = xfs_iflush(ip, XFS_IFLUSH_SYNC);
-
-			/*
-			 * If we can't acquire the flush lock, then the inode
-			 * is already being flushed so don't bother waiting.
-			 *
-			 * If we can lock it then do a delwri flush so we can
-			 * combine multiple inode flushes in each disk write.
-			 */
-			} else if (xfs_iflock_nowait(ip)) {
-				error = xfs_iflush(ip, XFS_IFLUSH_DELWRI);
-			} else if (bypassed) {
-				(*bypassed)++;
-			}
-		}
-
-		if (lock_flags != 0) {
-			xfs_iunlock(ip, lock_flags);
-		}
-
-		if (vnode_refed) {
-			/*
-			 * If we had to take a reference on the vnode
-			 * above, then wait until after we've unlocked
-			 * the inode to release the reference.  This is
-			 * because we can be already holding the inode
-			 * lock when IRELE() calls xfs_inactive().
-			 *
-			 * Make sure to drop the mount lock before calling
-			 * IRELE() so that we don't trip over ourselves if
-			 * we have to go for the mount lock again in the
-			 * inactive code.
-			 */
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
-
-			IRELE(ip);
-
-			vnode_refed = B_FALSE;
-		}
-
-		if (error) {
-			last_error = error;
-		}
-
-		/*
-		 * bail out if the filesystem is corrupted.
-		 */
-		if (error == EFSCORRUPTED)  {
-			if (!mount_locked) {
-				XFS_MOUNT_ILOCK(mp);
-				IPOINTER_REMOVE(ip, mp);
-			}
-			XFS_MOUNT_IUNLOCK(mp);
-			ASSERT(ipointer_in == B_FALSE);
-			kmem_free(ipointer);
-			return XFS_ERROR(error);
-		}
-
-		/* Let other threads have a chance at the mount lock
-		 * if we have looped many times without dropping the
-		 * lock.
-		 */
-		if ((++preempt & XFS_PREEMPT_MASK) == 0) {
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
-		}
-
-		if (mount_locked == B_FALSE) {
-			XFS_MOUNT_ILOCK(mp);
-			mount_locked = B_TRUE;
-			IPOINTER_REMOVE(ip, mp);
-			continue;
-		}
-
-		ASSERT(ipointer_in == B_FALSE);
-		ip = ip->i_mnext;
-
-	} while (ip != mp->m_inodes);
-
-	XFS_MOUNT_IUNLOCK(mp);
-
-	ASSERT(ipointer_in == B_FALSE);
-
-	kmem_free(ipointer);
-	return XFS_ERROR(last_error);
-}
-
-/*
- * xfs sync routine for internal use
- *
- * This routine supports all of the flags defined for the generic vfs_sync
- * interface as explained above under xfs_sync.
- *
- */
-int
-xfs_syncsub(
-	xfs_mount_t	*mp,
-	int		flags,
-	int             *bypassed)
-{
-	int		error = 0;
-	int		last_error = 0;
-	uint		log_flags = XFS_LOG_FORCE;
-	xfs_buf_t	*bp;
-	xfs_buf_log_item_t	*bip;
-
-	/*
-	 * Sync out the log.  This ensures that the log is periodically
-	 * flushed even if there is not enough activity to fill it up.
-	 */
-	if (flags & SYNC_WAIT)
-		log_flags |= XFS_LOG_SYNC;
-
-	xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
-
-	if (flags & (SYNC_ATTR|SYNC_DELWRI)) {
-		if (flags & SYNC_BDFLUSH)
-			xfs_finish_reclaim_all(mp, 1);
-		else
-			error = xfs_sync_inodes(mp, flags, bypassed);
-	}
-
-	/*
-	 * Flushing out dirty data above probably generated more
-	 * log activity, so if this isn't vfs_sync() then flush
-	 * the log again.
-	 */
-	if (flags & SYNC_DELWRI) {
-		xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
-	}
-
-	if (flags & SYNC_FSDATA) {
-		/*
-		 * If this is vfs_sync() then only sync the superblock
-		 * if we can lock it without sleeping and it is not pinned.
-		 */
-		if (flags & SYNC_BDFLUSH) {
-			bp = xfs_getsb(mp, XFS_BUF_TRYLOCK);
-			if (bp != NULL) {
-				bip = XFS_BUF_FSPRIVATE(bp,xfs_buf_log_item_t*);
-				if ((bip != NULL) &&
-				    xfs_buf_item_dirty(bip)) {
-					if (!(XFS_BUF_ISPINNED(bp))) {
-						XFS_BUF_ASYNC(bp);
-						error = xfs_bwrite(mp, bp);
-					} else {
-						xfs_buf_relse(bp);
-					}
-				} else {
-					xfs_buf_relse(bp);
-				}
-			}
-		} else {
-			bp = xfs_getsb(mp, 0);
-			/*
-			 * If the buffer is pinned then push on the log so
-			 * we won't get stuck waiting in the write for
-			 * someone, maybe ourselves, to flush the log.
-			 * Even though we just pushed the log above, we
-			 * did not have the superblock buffer locked at
-			 * that point so it can become pinned in between
-			 * there and here.
-			 */
-			if (XFS_BUF_ISPINNED(bp))
-				xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
-			if (flags & SYNC_WAIT)
-				XFS_BUF_UNASYNC(bp);
-			else
-				XFS_BUF_ASYNC(bp);
-			error = xfs_bwrite(mp, bp);
-		}
-		if (error) {
-			last_error = error;
-		}
-	}
-
-	/*
-	 * Now check to see if the log needs a "dummy" transaction.
-	 */
-	if (!(flags & SYNC_REMOUNT) && xfs_log_need_covered(mp)) {
-		xfs_trans_t *tp;
-		xfs_inode_t *ip;
-
-		/*
-		 * Put a dummy transaction in the log to tell
-		 * recovery that all others are OK.
-		 */
-		tp = xfs_trans_alloc(mp, XFS_TRANS_DUMMY1);
-		if ((error = xfs_trans_reserve(tp, 0,
-				XFS_ICHANGE_LOG_RES(mp),
-				0, 0, 0)))  {
-			xfs_trans_cancel(tp, 0);
-			return error;
-		}
-
-		ip = mp->m_rootip;
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-
-		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-		xfs_trans_ihold(tp, ip);
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-		error = xfs_trans_commit(tp, 0);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
-	}
-
-	/*
-	 * When shutting down, we need to insure that the AIL is pushed
-	 * to disk or the filesystem can appear corrupt from the PROM.
-	 */
-	if ((flags & (SYNC_CLOSE|SYNC_WAIT)) == (SYNC_CLOSE|SYNC_WAIT)) {
-		XFS_bflush(mp->m_ddev_targp);
-		if (mp->m_rtdev_targp) {
-			XFS_bflush(mp->m_rtdev_targp);
-		}
-	}
-
-	return XFS_ERROR(last_error);
-}
diff --git a/fs/xfs/xfs_vfsops.h b/fs/xfs/xfs_vfsops.h
index a74b050..6701d0e 100644
--- a/fs/xfs/xfs_vfsops.h
+++ b/fs/xfs/xfs_vfsops.h
@@ -8,7 +8,6 @@ struct kstatfs;
 struct xfs_mount;
 struct xfs_mount_args;
 
-int xfs_sync(struct xfs_mount *mp, int flags);
 void xfs_do_force_shutdown(struct xfs_mount *mp, int flags, char *fname,
 		int lnnum);
 void xfs_attr_quiesce(struct xfs_mount *mp);
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/28] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V3
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
  2008-08-19 13:16 ` [PATCH 01/28] XFS: remove i_gen from incore inode Dave Chinner
  2008-08-19 13:16 ` [PATCH 02/28] XFS: move sync code to its own file Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 05/28] XFS: Use the inode tree for finding dirty inodes V3 Dave Chinner
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

xfs_iflush_all() walks the m_inodes list to find inodes that
need reclaiming. We already have such a list - the m_del_inodes
list. Replace xfs_iflush_all() with a call to xfs_finish_reclaim_all()
and clean up xfs_finish_reclaim_all() to handle the different flush
modes now needed.

Originally based on a patch from Christoph Hellwig.

Version 3
o rediff against new linux-2.6/xfs_sync.c code

Version 2
o revert xfs_syncsub() inode reclaim behaviour back to original
  code
o xfs_quiesce_fs() should use XFS_IFLUSH_DELWRI_ELSE_ASYNC, not
  XFS_IFLUSH_ASYNC, to prevent change of behaviour.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |    2 +-
 fs/xfs/xfs_inode.c          |   35 -----------------------------------
 fs/xfs/xfs_inode.h          |    3 +--
 fs/xfs/xfs_mount.c          |    2 +-
 fs/xfs/xfs_vfsops.c         |    2 +-
 fs/xfs/xfs_vnodeops.c       |   42 ++++++++++++++++++------------------------
 6 files changed, 22 insertions(+), 64 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index a51534c..cd82ba5 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -504,7 +504,7 @@ xfs_syncsub(
 
 	if (flags & (SYNC_ATTR|SYNC_DELWRI)) {
 		if (flags & SYNC_BDFLUSH)
-			xfs_finish_reclaim_all(mp, 1);
+			xfs_finish_reclaim_all(mp, 1, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 		else
 			error = xfs_sync_inodes(mp, flags, bypassed);
 	}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 358511b..97f9d41 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3459,41 +3459,6 @@ corrupt_out:
 }
 
 
-/*
- * Flush all inactive inodes in mp.
- */
-void
-xfs_iflush_all(
-	xfs_mount_t	*mp)
-{
-	xfs_inode_t	*ip;
-
- again:
-	XFS_MOUNT_ILOCK(mp);
-	ip = mp->m_inodes;
-	if (ip == NULL)
-		goto out;
-
-	do {
-		/* Make sure we skip markers inserted by sync */
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
-			continue;
-		}
-
-		if (!VFS_I(ip)) {
-			XFS_MOUNT_IUNLOCK(mp);
-			xfs_finish_reclaim(ip, 0, XFS_IFLUSH_ASYNC);
-			goto again;
-		}
-
-		ASSERT(vn_count(VFS_I(ip)) == 0);
-
-		ip = ip->i_mnext;
-	} while (ip != mp->m_inodes);
- out:
-	XFS_MOUNT_IUNLOCK(mp);
-}
 
 #ifdef XFS_ILOCK_TRACE
 ktrace_t	*xfs_ilock_trace_buf;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 9559e0b..ff68540 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -484,7 +484,7 @@ uint		xfs_ilock_map_shared(xfs_inode_t *);
 void		xfs_iunlock_map_shared(xfs_inode_t *, uint);
 void		xfs_ireclaim(xfs_inode_t *);
 int		xfs_finish_reclaim(xfs_inode_t *, int, int);
-int		xfs_finish_reclaim_all(struct xfs_mount *, int);
+int		xfs_finish_reclaim_all(struct xfs_mount *, int, int);
 
 /*
  * xfs_inode.c prototypes.
@@ -522,7 +522,6 @@ void		xfs_ipin(xfs_inode_t *);
 void		xfs_iunpin(xfs_inode_t *);
 int		xfs_iextents_copy(xfs_inode_t *, xfs_bmbt_rec_t *, int);
 int		xfs_iflush(xfs_inode_t *, uint);
-void		xfs_iflush_all(struct xfs_mount *);
 void		xfs_ichgtime(xfs_inode_t *, int);
 xfs_fsize_t	xfs_file_last_byte(xfs_inode_t *);
 void		xfs_lock_inodes(xfs_inode_t **, int, uint);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index a4503f5..e222ab9 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1241,7 +1241,7 @@ xfs_unmountfs(
 	 * need to force the log first.
 	 */
 	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
-	xfs_iflush_all(mp);
+	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_ASYNC);
 
 	XFS_QM_DQPURGEALL(mp, XFS_QMOPT_QUOTALL | XFS_QMOPT_UMOUNTING);
 
diff --git a/fs/xfs/xfs_vfsops.c b/fs/xfs/xfs_vfsops.c
index 01e274b..0c5ee5e 100644
--- a/fs/xfs/xfs_vfsops.c
+++ b/fs/xfs/xfs_vfsops.c
@@ -66,7 +66,7 @@ xfs_quiesce_fs(
 	int			count = 0, pincount;
 
 	xfs_flush_buftarg(mp->m_ddev_targp, 0);
-	xfs_finish_reclaim_all(mp, 0);
+	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 
 	/* This loop must run at least twice.
 	 * The first instance of the loop will flush
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 0a5999f..d6eec3f 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2885,36 +2885,30 @@ xfs_finish_reclaim(
 }
 
 int
-xfs_finish_reclaim_all(xfs_mount_t *mp, int noblock)
+xfs_finish_reclaim_all(
+	xfs_mount_t	*mp,
+	int		 noblock,
+	int		mode)
 {
-	int		purged;
 	xfs_inode_t	*ip, *n;
-	int		done = 0;
 
-	while (!done) {
-		purged = 0;
-		XFS_MOUNT_ILOCK(mp);
-		list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
-			if (noblock) {
-				if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
-					continue;
-				if (xfs_ipincount(ip) ||
-				    !xfs_iflock_nowait(ip)) {
-					xfs_iunlock(ip, XFS_ILOCK_EXCL);
-					continue;
-				}
+restart:
+	XFS_MOUNT_ILOCK(mp);
+	list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
+		if (noblock) {
+			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
+				continue;
+			if (xfs_ipincount(ip) ||
+			    !xfs_iflock_nowait(ip)) {
+				xfs_iunlock(ip, XFS_ILOCK_EXCL);
+				continue;
 			}
-			XFS_MOUNT_IUNLOCK(mp);
-			if (xfs_finish_reclaim(ip, noblock,
-					XFS_IFLUSH_DELWRI_ELSE_ASYNC))
-				delay(1);
-			purged = 1;
-			break;
 		}
-
-		done = !purged;
+		XFS_MOUNT_IUNLOCK(mp);
+		if (xfs_finish_reclaim(ip, noblock, mode))
+			delay(1);
+		goto restart;
 	}
-
 	XFS_MOUNT_IUNLOCK(mp);
 	return 0;
 }
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/28] XFS: Use the inode tree for finding dirty inodes V3
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (2 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 04/28] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V3 Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 06/28] XFS: Traverse inode trees when releasing dquots V3 Dave Chinner
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Update xfs_sync_inodes to walk the inode radix tree cache to find
dirty inodes. This removes a huge bunch of nasty, messy code for
traversing the mount inode list safely and removes another user of
the mount inode list.

Version 3
o rediff against new linux-2.6/xfs_sync.c code

Version 2
o add comment explaining use of gang lookups for a single inode
o use IRELE, not VN_RELE
o move check for ag initialisation to caller.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |  361 ++++++++++++-------------------------------
 1 files changed, 101 insertions(+), 260 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index cd82ba5..53d85ec 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -121,356 +121,197 @@ xfs_sync(
 }
 
 /*
- * xfs sync routine for internal use
- *
- * This routine supports all of the flags defined for the generic vfs_sync
- * interface as explained above under xfs_sync.
- *
+ * Sync all the inodes in the given AG according to the
+ * direction given by the flags.
  */
-int
-xfs_sync_inodes(
+STATIC int
+xfs_sync_inodes_ag(
 	xfs_mount_t	*mp,
+	int		ag,
 	int		flags,
-	int             *bypassed)
+	int		*bypassed)
 {
 	xfs_inode_t	*ip = NULL;
 	struct inode	*vp = NULL;
-	int		error;
-	int		last_error;
-	uint64_t	fflag;
-	uint		lock_flags;
-	uint		base_lock_flags;
-	boolean_t	mount_locked;
-	boolean_t	vnode_refed;
-	int		preempt;
-	xfs_iptr_t	*ipointer;
-#ifdef DEBUG
-	boolean_t	ipointer_in = B_FALSE;
-
-#define IPOINTER_SET	ipointer_in = B_TRUE
-#define IPOINTER_CLR	ipointer_in = B_FALSE
-#else
-#define IPOINTER_SET
-#define IPOINTER_CLR
-#endif
-
-
-/* Insert a marker record into the inode list after inode ip. The list
- * must be locked when this is called. After the call the list will no
- * longer be locked.
- */
-#define IPOINTER_INSERT(ip, mp)	{ \
-		ASSERT(ipointer_in == B_FALSE); \
-		ipointer->ip_mnext = ip->i_mnext; \
-		ipointer->ip_mprev = ip; \
-		ip->i_mnext = (xfs_inode_t *)ipointer; \
-		ipointer->ip_mnext->i_mprev = (xfs_inode_t *)ipointer; \
-		preempt = 0; \
-		XFS_MOUNT_IUNLOCK(mp); \
-		mount_locked = B_FALSE; \
-		IPOINTER_SET; \
-	}
-
-/* Remove the marker from the inode list. If the marker was the only item
- * in the list then there are no remaining inodes and we should zero out
- * the whole list. If we are the current head of the list then move the head
- * past us.
- */
-#define IPOINTER_REMOVE(ip, mp)	{ \
-		ASSERT(ipointer_in == B_TRUE); \
-		if (ipointer->ip_mnext != (xfs_inode_t *)ipointer) { \
-			ip = ipointer->ip_mnext; \
-			ip->i_mprev = ipointer->ip_mprev; \
-			ipointer->ip_mprev->i_mnext = ip; \
-			if (mp->m_inodes == (xfs_inode_t *)ipointer) { \
-				mp->m_inodes = ip; \
-			} \
-		} else { \
-			ASSERT(mp->m_inodes == (xfs_inode_t *)ipointer); \
-			mp->m_inodes = NULL; \
-			ip = NULL; \
-		} \
-		IPOINTER_CLR; \
-	}
-
-#define XFS_PREEMPT_MASK	0x7f
-
-	ASSERT(!(flags & SYNC_BDFLUSH));
-
-	if (bypassed)
-		*bypassed = 0;
-	if (mp->m_flags & XFS_MOUNT_RDONLY)
-		return 0;
-	error = 0;
-	last_error = 0;
-	preempt = 0;
-
-	/* Allocate a reference marker */
-	ipointer = (xfs_iptr_t *)kmem_zalloc(sizeof(xfs_iptr_t), KM_SLEEP);
+	xfs_perag_t	*pag = &mp->m_perag[ag];
+	boolean_t	vnode_refed = B_FALSE;
+	int		nr_found;
+	int		first_index = 0;
+	int		error = 0;
+	int		last_error = 0;
+	int		fflag = XFS_B_ASYNC;
+	int		lock_flags = XFS_ILOCK_SHARED;
 
-	fflag = XFS_B_ASYNC;		/* default is don't wait */
 	if (flags & SYNC_DELWRI)
 		fflag = XFS_B_DELWRI;
 	if (flags & SYNC_WAIT)
 		fflag = 0;		/* synchronous overrides all */
 
-	base_lock_flags = XFS_ILOCK_SHARED;
 	if (flags & (SYNC_DELWRI | SYNC_CLOSE)) {
 		/*
 		 * We need the I/O lock if we're going to call any of
 		 * the flush/inval routines.
 		 */
-		base_lock_flags |= XFS_IOLOCK_SHARED;
+		lock_flags |= XFS_IOLOCK_SHARED;
 	}
 
-	XFS_MOUNT_ILOCK(mp);
-
-	ip = mp->m_inodes;
-
-	mount_locked = B_TRUE;
-	vnode_refed  = B_FALSE;
-
-	IPOINTER_CLR;
-
 	do {
-		ASSERT(ipointer_in == B_FALSE);
-		ASSERT(vnode_refed == B_FALSE);
-
-		lock_flags = base_lock_flags;
-
 		/*
-		 * There were no inodes in the list, just break out
-		 * of the loop.
+		 * use a gang lookup to find the next inode in the tree
+		 * as the tree is sparse and a gang lookup walks to find
+		 * the number of objects requested.
 		 */
-		if (ip == NULL) {
-			break;
-		}
+		read_lock(&pag->pag_ici_lock);
+		nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+				(void**)&ip, first_index, 1);
 
-		/*
-		 * We found another sync thread marker - skip it
-		 */
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
-			continue;
+		if (!nr_found) {
+			read_unlock(&pag->pag_ici_lock);
+			break;
 		}
 
-		vp = VFS_I(ip);
+		/* update the index for the next lookup */
+		first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
 
 		/*
-		 * If the vnode is gone then this is being torn down,
-		 * call reclaim if it is flushed, else let regular flush
-		 * code deal with it later in the loop.
+		 * skip inodes in reclaim. Let xfs_syncsub do that for
+		 * us so we don't need to worry.
 		 */
-
-		if (vp == NULL) {
-			/* Skip ones already in reclaim */
-			if (ip->i_flags & XFS_IRECLAIM) {
-				ip = ip->i_mnext;
-				continue;
-			}
-			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0) {
-				ip = ip->i_mnext;
-			} else if ((xfs_ipincount(ip) == 0) &&
-				    xfs_iflock_nowait(ip)) {
-				IPOINTER_INSERT(ip, mp);
-
-				xfs_finish_reclaim(ip, 1,
-						XFS_IFLUSH_DELWRI_ELSE_ASYNC);
-
-				XFS_MOUNT_ILOCK(mp);
-				mount_locked = B_TRUE;
-				IPOINTER_REMOVE(ip, mp);
-			} else {
-				xfs_iunlock(ip, XFS_ILOCK_EXCL);
-				ip = ip->i_mnext;
-			}
+		vp = VFS_I(ip);
+		if (!vp) {
+			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
 
+		/* bad inodes are dealt with elsewhere */
 		if (VN_BAD(vp)) {
-			ip = ip->i_mnext;
+			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
 
+		/* nothing to sync during shutdown */
 		if (XFS_FORCED_SHUTDOWN(mp) && !(flags & SYNC_CLOSE)) {
-			XFS_MOUNT_IUNLOCK(mp);
-			kmem_free(ipointer);
+			read_unlock(&pag->pag_ici_lock);
 			return 0;
 		}
 
 		/*
-		 * Try to lock without sleeping.  We're out of order with
-		 * the inode list lock here, so if we fail we need to drop
-		 * the mount lock and try again.  If we're called from
-		 * bdflush() here, then don't bother.
-		 *
-		 * The inode lock here actually coordinates with the
-		 * almost spurious inode lock in xfs_ireclaim() to prevent
-		 * the vnode we handle here without a reference from
-		 * being freed while we reference it.  If we lock the inode
-		 * while it's on the mount list here, then the spurious inode
-		 * lock in xfs_ireclaim() after the inode is pulled from
-		 * the mount list will sleep until we release it here.
-		 * This keeps the vnode from being freed while we reference
-		 * it.
+		 * The inode lock here actually coordinates with the almost
+		 * spurious inode lock in xfs_ireclaim() to prevent the vnode
+		 * we handle here without a reference from being freed while we
+		 * reference it.  If we lock the inode while it's on the mount
+		 * list here, then the spurious inode lock in xfs_ireclaim()
+		 * after the inode is pulled from the mount list will sleep
+		 * until we release it here.  This keeps the vnode from being
+		 * freed while we reference it.
 		 */
 		if (xfs_ilock_nowait(ip, lock_flags) == 0) {
-			if (vp == NULL) {
-				ip = ip->i_mnext;
-				continue;
-			}
-
 			vp = vn_grab(vp);
-			if (vp == NULL) {
-				ip = ip->i_mnext;
+			read_unlock(&pag->pag_ici_lock);
+			if (!vp)
 				continue;
-			}
-
-			IPOINTER_INSERT(ip, mp);
 			xfs_ilock(ip, lock_flags);
 
 			ASSERT(vp == VFS_I(ip));
 			ASSERT(ip->i_mount == mp);
 
 			vnode_refed = B_TRUE;
+		} else {
+			/* safe to unlock here as we have a reference */
+			read_unlock(&pag->pag_ici_lock);
 		}
-
-		/* From here on in the loop we may have a marker record
-		 * in the inode list.
-		 */
-
 		/*
 		 * If we have to flush data or wait for I/O completion
 		 * we need to drop the ilock that we currently hold.
 		 * If we need to drop the lock, insert a marker if we
 		 * have not already done so.
 		 */
-		if ((flags & (SYNC_CLOSE|SYNC_IOWAIT)) ||
-		    ((flags & SYNC_DELWRI) && VN_DIRTY(vp))) {
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
+		if (flags & SYNC_CLOSE) {
 			xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
-			if (flags & SYNC_CLOSE) {
-				/* Shutdown case. Flush and invalidate. */
-				if (XFS_FORCED_SHUTDOWN(mp))
-					xfs_tosspages(ip, 0, -1,
-							     FI_REMAPF);
-				else
-					error = xfs_flushinval_pages(ip,
-							0, -1, FI_REMAPF);
-			} else if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
-				error = xfs_flush_pages(ip, 0,
-							-1, fflag, FI_NONE);
-			}
-
-			/*
-			 * When freezing, we need to wait ensure all I/O (including direct
-			 * I/O) is complete to ensure no further data modification can take
-			 * place after this point
-			 */
+			if (XFS_FORCED_SHUTDOWN(mp))
+				xfs_tosspages(ip, 0, -1, FI_REMAPF);
+			else
+				error = xfs_flushinval_pages(ip, 0, -1,
+							FI_REMAPF);
+			/* wait for I/O on freeze */
 			if (flags & SYNC_IOWAIT)
 				vn_iowait(ip);
 
 			xfs_ilock(ip, XFS_ILOCK_SHARED);
 		}
 
-		if ((flags & SYNC_ATTR) &&
-		    (ip->i_update_core ||
-		     (ip->i_itemp && ip->i_itemp->ili_format.ilf_fields))) {
-			if (mount_locked)
-				IPOINTER_INSERT(ip, mp);
+		if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
+			xfs_iunlock(ip, XFS_ILOCK_SHARED);
+			error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE);
+			if (flags & SYNC_IOWAIT)
+				vn_iowait(ip);
+			xfs_ilock(ip, XFS_ILOCK_SHARED);
+		}
 
+		if ((flags & SYNC_ATTR) && !xfs_inode_clean(ip)) {
 			if (flags & SYNC_WAIT) {
 				xfs_iflock(ip);
-				error = xfs_iflush(ip, XFS_IFLUSH_SYNC);
-
-			/*
-			 * If we can't acquire the flush lock, then the inode
-			 * is already being flushed so don't bother waiting.
-			 *
-			 * If we can lock it then do a delwri flush so we can
-			 * combine multiple inode flushes in each disk write.
-			 */
+				if (!xfs_inode_clean(ip))
+					error = xfs_iflush(ip, XFS_IFLUSH_SYNC);
+				else
+					xfs_ifunlock(ip);
 			} else if (xfs_iflock_nowait(ip)) {
-				error = xfs_iflush(ip, XFS_IFLUSH_DELWRI);
+				if (!xfs_inode_clean(ip))
+					error = xfs_iflush(ip, XFS_IFLUSH_DELWRI);
+				else
+					xfs_ifunlock(ip);
 			} else if (bypassed) {
 				(*bypassed)++;
 			}
 		}
 
-		if (lock_flags != 0) {
+		if (lock_flags)
 			xfs_iunlock(ip, lock_flags);
-		}
 
 		if (vnode_refed) {
-			/*
-			 * If we had to take a reference on the vnode
-			 * above, then wait until after we've unlocked
-			 * the inode to release the reference.  This is
-			 * because we can be already holding the inode
-			 * lock when IRELE() calls xfs_inactive().
-			 *
-			 * Make sure to drop the mount lock before calling
-			 * IRELE() so that we don't trip over ourselves if
-			 * we have to go for the mount lock again in the
-			 * inactive code.
-			 */
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
-
 			IRELE(ip);
-
 			vnode_refed = B_FALSE;
 		}
 
-		if (error) {
+		if (error)
 			last_error = error;
-		}
-
 		/*
 		 * bail out if the filesystem is corrupted.
 		 */
-		if (error == EFSCORRUPTED)  {
-			if (!mount_locked) {
-				XFS_MOUNT_ILOCK(mp);
-				IPOINTER_REMOVE(ip, mp);
-			}
-			XFS_MOUNT_IUNLOCK(mp);
-			ASSERT(ipointer_in == B_FALSE);
-			kmem_free(ipointer);
+		if (error == EFSCORRUPTED)
 			return XFS_ERROR(error);
-		}
-
-		/* Let other threads have a chance at the mount lock
-		 * if we have looped many times without dropping the
-		 * lock.
-		 */
-		if ((++preempt & XFS_PREEMPT_MASK) == 0) {
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
-		}
-
-		if (mount_locked == B_FALSE) {
-			XFS_MOUNT_ILOCK(mp);
-			mount_locked = B_TRUE;
-			IPOINTER_REMOVE(ip, mp);
-			continue;
-		}
 
-		ASSERT(ipointer_in == B_FALSE);
-		ip = ip->i_mnext;
+	} while (nr_found);
 
-	} while (ip != mp->m_inodes);
+	return last_error;
+}
 
-	XFS_MOUNT_IUNLOCK(mp);
+int
+xfs_sync_inodes(
+	xfs_mount_t	*mp,
+	int		flags,
+	int             *bypassed)
+{
+	int		error;
+	int		last_error;
+	int		i;
 
-	ASSERT(ipointer_in == B_FALSE);
+	if (bypassed)
+		*bypassed = 0;
+	if (mp->m_flags & XFS_MOUNT_RDONLY)
+		return 0;
+	error = 0;
+	last_error = 0;
 
-	kmem_free(ipointer);
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		if (!mp->m_perag[i].pag_ici_init)
+			continue;
+		error = xfs_sync_inodes_ag(mp, i, flags, bypassed);
+		if (error)
+			last_error = error;
+		if (error == EFSCORRUPTED)
+			break;
+	}
 	return XFS_ERROR(last_error);
 }
 
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/28] XFS: Traverse inode trees when releasing dquots V3
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (3 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 05/28] XFS: Use the inode tree for finding dirty inodes V3 Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 08/28] XFS: split out two helpers from xfs_syncsub Dave Chinner
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Make releasing all inode dquots traverse the per-ag
inode radix trees rather than the mount inode list.
This removes another user of the mount inode list.

Version 3
o fix comment relating to avoiding trying to release the
  quota inodes and those in reclaim.

Version 2
o add comment explaining use of gang lookups for a single inode
o use IRELE, not VN_RELE
o move check for ag initialisation to caller.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/quota/xfs_qm_syscalls.c |  127 ++++++++++++++++++---------------------
 1 files changed, 59 insertions(+), 68 deletions(-)

diff --git a/fs/xfs/quota/xfs_qm_syscalls.c b/fs/xfs/quota/xfs_qm_syscalls.c
index 1a3b803..26152b9 100644
--- a/fs/xfs/quota/xfs_qm_syscalls.c
+++ b/fs/xfs/quota/xfs_qm_syscalls.c
@@ -1022,101 +1022,92 @@ xfs_qm_export_flags(
 
 
 /*
- * Go thru all the inodes in the file system, releasing their dquots.
- * Note that the mount structure gets modified to indicate that quotas are off
- * AFTER this, in the case of quotaoff. This also gets called from
- * xfs_rootumount.
+ * Release all the dquots on the inodes in an AG.
  */
-void
-xfs_qm_dqrele_all_inodes(
-	struct xfs_mount *mp,
-	uint		 flags)
+STATIC void
+xfs_qm_dqrele_inodes_ag(
+	xfs_mount_t	*mp,
+	int		ag,
+	uint		flags)
 {
-	xfs_inode_t	*ip, *topino;
-	uint		ireclaims;
-	struct inode	*vp;
-	boolean_t	vnode_refd;
+	xfs_inode_t	*ip = NULL;
+	struct inode	*vp = NULL;
+	xfs_perag_t	*pag = &mp->m_perag[ag];
+	int		first_index = 0;
+	int		nr_found;
 
-	ASSERT(mp->m_quotainfo);
-
-	XFS_MOUNT_ILOCK(mp);
-again:
-	ip = mp->m_inodes;
-	if (ip == NULL) {
-		XFS_MOUNT_IUNLOCK(mp);
-		return;
-	}
 	do {
-		/* Skip markers inserted by xfs_sync */
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
-			continue;
-		}
-		/* Root inode, rbmip and rsumip have associated blocks */
-		if (ip == XFS_QI_UQIP(mp) || ip == XFS_QI_GQIP(mp)) {
-			ASSERT(ip->i_udquot == NULL);
-			ASSERT(ip->i_gdquot == NULL);
-			ip = ip->i_mnext;
-			continue;
+		boolean_t	vnode_refd = B_FALSE;
+
+		/*
+		 * use a gang lookup to find the next inode in the tree
+		 * as the tree is sparse and a gang lookup walks to find
+		 * the number of objects requested.
+		 */
+		read_lock(&pag->pag_ici_lock);
+		nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+				(void**)&ip, first_index, 1);
+
+		if (!nr_found) {
+			read_unlock(&pag->pag_ici_lock);
+			break;
 		}
+
+		/* update the index for the next lookup */
+		first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
+
+		/* skip quota inodes and those in reclaim */
 		vp = VFS_I(ip);
-		if (!vp) {
+		if (!vp || ip == XFS_QI_UQIP(mp) || ip == XFS_QI_GQIP(mp)) {
 			ASSERT(ip->i_udquot == NULL);
 			ASSERT(ip->i_gdquot == NULL);
-			ip = ip->i_mnext;
+			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
-		vnode_refd = B_FALSE;
 		if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0) {
-			ireclaims = mp->m_ireclaims;
-			topino = mp->m_inodes;
 			vp = vn_grab(vp);
+			read_unlock(&pag->pag_ici_lock);
 			if (!vp)
-				goto again;
-
-			XFS_MOUNT_IUNLOCK(mp);
-			/* XXX restart limit ? */
-			xfs_ilock(ip, XFS_ILOCK_EXCL);
+				continue;
 			vnode_refd = B_TRUE;
+			xfs_ilock(ip, XFS_ILOCK_EXCL);
 		} else {
-			ireclaims = mp->m_ireclaims;
-			topino = mp->m_inodes;
-			XFS_MOUNT_IUNLOCK(mp);
+			read_unlock(&pag->pag_ici_lock);
 		}
-
-		/*
-		 * We don't keep the mountlock across the dqrele() call,
-		 * since it can take a while..
-		 */
 		if ((flags & XFS_UQUOTA_ACCT) && ip->i_udquot) {
 			xfs_qm_dqrele(ip->i_udquot);
 			ip->i_udquot = NULL;
 		}
-		if (flags & (XFS_PQUOTA_ACCT|XFS_GQUOTA_ACCT) && ip->i_gdquot) {
+		if (flags & (XFS_PQUOTA_ACCT|XFS_GQUOTA_ACCT) &&
+		    ip->i_gdquot) {
 			xfs_qm_dqrele(ip->i_gdquot);
 			ip->i_gdquot = NULL;
 		}
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		/*
-		 * Wait until we've dropped the ilock and mountlock to
-		 * do the vn_rele. Or be condemned to an eternity in the
-		 * inactive code in hell.
-		 */
 		if (vnode_refd)
 			IRELE(ip);
-		XFS_MOUNT_ILOCK(mp);
-		/*
-		 * If an inode was inserted or removed, we gotta
-		 * start over again.
-		 */
-		if (topino != mp->m_inodes || mp->m_ireclaims != ireclaims) {
-			/* XXX use a sentinel */
-			goto again;
-		}
-		ip = ip->i_mnext;
-	} while (ip != mp->m_inodes);
+	} while (nr_found);
+}
 
-	XFS_MOUNT_IUNLOCK(mp);
+/*
+ * Go thru all the inodes in the file system, releasing their dquots.
+ * Note that the mount structure gets modified to indicate that quotas are off
+ * AFTER this, in the case of quotaoff. This also gets called from
+ * xfs_rootumount.
+ */
+void
+xfs_qm_dqrele_all_inodes(
+	struct xfs_mount *mp,
+	uint		 flags)
+{
+	int		i;
+
+	ASSERT(mp->m_quotainfo);
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		if (!mp->m_perag[i].pag_ici_init)
+			continue;
+		xfs_qm_dqrele_inodes_ag(mp, i, flags);
+	}
 }
 
 /*------------------------------------------------------------------------*/
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/28] XFS: split out two helpers from xfs_syncsub
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (4 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 06/28] XFS: Traverse inode trees when releasing dquots V3 Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 09/28] XFS: Use struct inodes instead of vnodes to kill vn_grab Dave Chinner
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

From: Christoph Hellwig <hch@lst.de>

Split out two helpers from xfs_syncsub for the dummy log commit
and the superblock writeout.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |  162 +++++++++++++++++++++++++------------------
 1 files changed, 93 insertions(+), 69 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 53d85ec..59da332 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -315,6 +315,93 @@ xfs_sync_inodes(
 	return XFS_ERROR(last_error);
 }
 
+STATIC int
+xfs_commit_dummy_trans(
+	struct xfs_mount	*mp,
+	uint			log_flags)
+{
+	struct xfs_inode	*ip = mp->m_rootip;
+	struct xfs_trans	*tp;
+	int			error;
+
+	/*
+	 * Put a dummy transaction in the log to tell recovery
+	 * that all others are OK.
+	 */
+	tp = xfs_trans_alloc(mp, XFS_TRANS_DUMMY1);
+	error = xfs_trans_reserve(tp, 0, XFS_ICHANGE_LOG_RES(mp), 0, 0, 0);
+	if (error) {
+		xfs_trans_cancel(tp, 0);
+		return error;
+	}
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ihold(tp, ip);
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	/* XXX(hch): ignoring the error here.. */
+	error = xfs_trans_commit(tp, 0);
+
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	xfs_log_force(mp, 0, log_flags);
+	return 0;
+}
+
+STATIC int
+xfs_sync_fsdata(
+	struct xfs_mount	*mp,
+	int			flags)
+{
+	struct xfs_buf		*bp;
+	struct xfs_buf_log_item	*bip;
+	int			error = 0;
+
+	/*
+	 * If this is xfssyncd() then only sync the superblock if we can
+	 * lock it without sleeping and it is not pinned.
+	 */
+	if (flags & SYNC_BDFLUSH) {
+		ASSERT(!(flags & SYNC_WAIT));
+
+		bp = xfs_getsb(mp, XFS_BUF_TRYLOCK);
+		if (!bp)
+			goto out;
+
+		bip = XFS_BUF_FSPRIVATE(bp, struct xfs_buf_log_item *);
+		if (!bip || !xfs_buf_item_dirty(bip) || XFS_BUF_ISPINNED(bp))
+			goto out_brelse;
+	} else {
+		bp = xfs_getsb(mp, 0);
+
+		/*
+		 * If the buffer is pinned then push on the log so we won't
+		 * get stuck waiting in the write for someone, maybe
+		 * ourselves, to flush the log.
+		 *
+		 * Even though we just pushed the log above, we did not have
+		 * the superblock buffer locked at that point so it can
+		 * become pinned in between there and here.
+		 */
+		if (XFS_BUF_ISPINNED(bp))
+			xfs_log_force(mp, 0, XFS_LOG_FORCE);
+	}
+
+
+	if (flags & SYNC_WAIT)
+		XFS_BUF_UNASYNC(bp);
+	else
+		XFS_BUF_ASYNC(bp);
+
+	return xfs_bwrite(mp, bp);
+
+ out_brelse:
+	xfs_buf_relse(bp);
+ out:
+	return error;
+}
+
 /*
  * xfs sync routine for internal use
  *
@@ -331,8 +418,6 @@ xfs_syncsub(
 	int		error = 0;
 	int		last_error = 0;
 	uint		log_flags = XFS_LOG_FORCE;
-	xfs_buf_t	*bp;
-	xfs_buf_log_item_t	*bip;
 
 	/*
 	 * Sync out the log.  This ensures that the log is periodically
@@ -355,83 +440,22 @@ xfs_syncsub(
 	 * log activity, so if this isn't vfs_sync() then flush
 	 * the log again.
 	 */
-	if (flags & SYNC_DELWRI) {
-		xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
-	}
+	if (flags & SYNC_DELWRI)
+		xfs_log_force(mp, 0, log_flags);
 
 	if (flags & SYNC_FSDATA) {
-		/*
-		 * If this is vfs_sync() then only sync the superblock
-		 * if we can lock it without sleeping and it is not pinned.
-		 */
-		if (flags & SYNC_BDFLUSH) {
-			bp = xfs_getsb(mp, XFS_BUF_TRYLOCK);
-			if (bp != NULL) {
-				bip = XFS_BUF_FSPRIVATE(bp,xfs_buf_log_item_t*);
-				if ((bip != NULL) &&
-				    xfs_buf_item_dirty(bip)) {
-					if (!(XFS_BUF_ISPINNED(bp))) {
-						XFS_BUF_ASYNC(bp);
-						error = xfs_bwrite(mp, bp);
-					} else {
-						xfs_buf_relse(bp);
-					}
-				} else {
-					xfs_buf_relse(bp);
-				}
-			}
-		} else {
-			bp = xfs_getsb(mp, 0);
-			/*
-			 * If the buffer is pinned then push on the log so
-			 * we won't get stuck waiting in the write for
-			 * someone, maybe ourselves, to flush the log.
-			 * Even though we just pushed the log above, we
-			 * did not have the superblock buffer locked at
-			 * that point so it can become pinned in between
-			 * there and here.
-			 */
-			if (XFS_BUF_ISPINNED(bp))
-				xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
-			if (flags & SYNC_WAIT)
-				XFS_BUF_UNASYNC(bp);
-			else
-				XFS_BUF_ASYNC(bp);
-			error = xfs_bwrite(mp, bp);
-		}
-		if (error) {
+		error = xfs_sync_fsdata(mp, flags);
+		if (error)
 			last_error = error;
-		}
 	}
 
 	/*
 	 * Now check to see if the log needs a "dummy" transaction.
 	 */
 	if (!(flags & SYNC_REMOUNT) && xfs_log_need_covered(mp)) {
-		xfs_trans_t *tp;
-		xfs_inode_t *ip;
-
-		/*
-		 * Put a dummy transaction in the log to tell
-		 * recovery that all others are OK.
-		 */
-		tp = xfs_trans_alloc(mp, XFS_TRANS_DUMMY1);
-		if ((error = xfs_trans_reserve(tp, 0,
-				XFS_ICHANGE_LOG_RES(mp),
-				0, 0, 0)))  {
-			xfs_trans_cancel(tp, 0);
+		error = xfs_commit_dummy_trans(mp, log_flags);
+		if (error)
 			return error;
-		}
-
-		ip = mp->m_rootip;
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-
-		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-		xfs_trans_ihold(tp, ip);
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-		error = xfs_trans_commit(tp, 0);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
 	}
 
 	/*
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/28] XFS: Use struct inodes instead of vnodes to kill vn_grab
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (5 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 08/28] XFS: split out two helpers from xfs_syncsub Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 12/28] XFS: xfssyncd: don't call xfs_sync Dave Chinner
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

With the sync code relocated to the linux-2.6 directory we can use
struct inodes directly. If we do the same thing for the quota
release code, we can remove vn_grab altogether.  While here, convert
the VN_BAD() checks to is_bad_inode() so we can remove vnodes
entirely from this code.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c    |   53 +++++++++++++++++++--------------------
 fs/xfs/linux-2.6/xfs_vnode.c   |    6 ++--
 fs/xfs/linux-2.6/xfs_vnode.h   |    5 ----
 fs/xfs/quota/xfs_qm_syscalls.c |   16 ++++++------
 4 files changed, 37 insertions(+), 43 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 59da332..461c1dc 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -131,10 +131,7 @@ xfs_sync_inodes_ag(
 	int		flags,
 	int		*bypassed)
 {
-	xfs_inode_t	*ip = NULL;
-	struct inode	*vp = NULL;
 	xfs_perag_t	*pag = &mp->m_perag[ag];
-	boolean_t	vnode_refed = B_FALSE;
 	int		nr_found;
 	int		first_index = 0;
 	int		error = 0;
@@ -156,6 +153,10 @@ xfs_sync_inodes_ag(
 	}
 
 	do {
+		struct inode	*inode;
+		boolean_t	inode_refed;
+		xfs_inode_t	*ip = NULL;
+
 		/*
 		 * use a gang lookup to find the next inode in the tree
 		 * as the tree is sparse and a gang lookup walks to find
@@ -177,14 +178,14 @@ xfs_sync_inodes_ag(
 		 * skip inodes in reclaim. Let xfs_syncsub do that for
 		 * us so we don't need to worry.
 		 */
-		vp = VFS_I(ip);
-		if (!vp) {
+		if (xfs_iflags_test(ip, (XFS_IRECLAIM|XFS_IRECLAIMABLE))) {
 			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
 
 		/* bad inodes are dealt with elsewhere */
-		if (VN_BAD(vp)) {
+		inode = VFS_I(ip);
+		if (is_bad_inode(inode)) {
 			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
@@ -196,30 +197,29 @@ xfs_sync_inodes_ag(
 		}
 
 		/*
-		 * The inode lock here actually coordinates with the almost
-		 * spurious inode lock in xfs_ireclaim() to prevent the vnode
-		 * we handle here without a reference from being freed while we
-		 * reference it.  If we lock the inode while it's on the mount
-		 * list here, then the spurious inode lock in xfs_ireclaim()
-		 * after the inode is pulled from the mount list will sleep
-		 * until we release it here.  This keeps the vnode from being
-		 * freed while we reference it.
+		 * If we can't get a reference on the VFS_I, the inode must be
+		 * in reclaim. If we can get the inode lock without blocking,
+		 * it is safe to flush the inode because we hold the tree lock
+		 * and xfs_iextract will block right now. Hence if we lock the
+		 * inode while holding the tree lock, xfs_ireclaim() is
+		 * guaranteed to block on the inode lock we now hold and hence
+		 * it is safe to reference the inode until we drop the inode
+		 * locks completely.
 		 */
-		if (xfs_ilock_nowait(ip, lock_flags) == 0) {
-			vp = vn_grab(vp);
+		inode_refed = B_FALSE;
+		if (igrab(inode)) {
 			read_unlock(&pag->pag_ici_lock);
-			if (!vp)
-				continue;
 			xfs_ilock(ip, lock_flags);
-
-			ASSERT(vp == VFS_I(ip));
-			ASSERT(ip->i_mount == mp);
-
-			vnode_refed = B_TRUE;
+			inode_refed = B_TRUE;
 		} else {
-			/* safe to unlock here as we have a reference */
+			if (!xfs_ilock_nowait(ip, lock_flags)) {
+				/* leave it to reclaim */
+				read_unlock(&pag->pag_ici_lock);
+				continue;
+			}
 			read_unlock(&pag->pag_ici_lock);
 		}
+
 		/*
 		 * If we have to flush data or wait for I/O completion
 		 * we need to drop the ilock that we currently hold.
@@ -240,7 +240,7 @@ xfs_sync_inodes_ag(
 			xfs_ilock(ip, XFS_ILOCK_SHARED);
 		}
 
-		if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
+		if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) {
 			xfs_iunlock(ip, XFS_ILOCK_SHARED);
 			error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE);
 			if (flags & SYNC_IOWAIT)
@@ -268,9 +268,8 @@ xfs_sync_inodes_ag(
 		if (lock_flags)
 			xfs_iunlock(ip, lock_flags);
 
-		if (vnode_refed) {
+		if (inode_refed) {
 			IRELE(ip);
-			vnode_refed = B_FALSE;
 		}
 
 		if (error)
diff --git a/fs/xfs/linux-2.6/xfs_vnode.c b/fs/xfs/linux-2.6/xfs_vnode.c
index b52528b..dceb6db 100644
--- a/fs/xfs/linux-2.6/xfs_vnode.c
+++ b/fs/xfs/linux-2.6/xfs_vnode.c
@@ -90,10 +90,10 @@ vn_ioerror(
  */
 static inline int xfs_icount(struct xfs_inode *ip)
 {
-	struct inode *vp = VFS_I(ip);
+	struct inode *inode = VFS_I(ip);
 
-	if (vp)
-		return vn_count(vp);
+	if (!inode)
+		return atomic_read(&inode->i_count);
 	return -1;
 }
 
diff --git a/fs/xfs/linux-2.6/xfs_vnode.h b/fs/xfs/linux-2.6/xfs_vnode.h
index 683ce16..bf89e41 100644
--- a/fs/xfs/linux-2.6/xfs_vnode.h
+++ b/fs/xfs/linux-2.6/xfs_vnode.h
@@ -80,11 +80,6 @@ do { \
 	iput(VFS_I(ip)); \
 } while (0)
 
-static inline struct inode *vn_grab(struct inode *vp)
-{
-	return igrab(vp);
-}
-
 /*
  * Dealing with bad inodes
  */
diff --git a/fs/xfs/quota/xfs_qm_syscalls.c b/fs/xfs/quota/xfs_qm_syscalls.c
index 26152b9..4254b07 100644
--- a/fs/xfs/quota/xfs_qm_syscalls.c
+++ b/fs/xfs/quota/xfs_qm_syscalls.c
@@ -1031,13 +1031,13 @@ xfs_qm_dqrele_inodes_ag(
 	uint		flags)
 {
 	xfs_inode_t	*ip = NULL;
-	struct inode	*vp = NULL;
 	xfs_perag_t	*pag = &mp->m_perag[ag];
 	int		first_index = 0;
 	int		nr_found;
 
 	do {
-		boolean_t	vnode_refd = B_FALSE;
+		boolean_t	inode_refed;
+		struct inode	*inode;
 
 		/*
 		 * use a gang lookup to find the next inode in the tree
@@ -1057,19 +1057,19 @@ xfs_qm_dqrele_inodes_ag(
 		first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
 
 		/* skip quota inodes and those in reclaim */
-		vp = VFS_I(ip);
-		if (!vp || ip == XFS_QI_UQIP(mp) || ip == XFS_QI_GQIP(mp)) {
+		inode = VFS_I(ip);
+		if (!inode || ip == XFS_QI_UQIP(mp) || ip == XFS_QI_GQIP(mp)) {
 			ASSERT(ip->i_udquot == NULL);
 			ASSERT(ip->i_gdquot == NULL);
 			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
 		if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0) {
-			vp = vn_grab(vp);
+			inode = igrab(inode);
 			read_unlock(&pag->pag_ici_lock);
-			if (!vp)
+			if (!inode)
 				continue;
-			vnode_refd = B_TRUE;
+			inode_refed = B_TRUE;
 			xfs_ilock(ip, XFS_ILOCK_EXCL);
 		} else {
 			read_unlock(&pag->pag_ici_lock);
@@ -1084,7 +1084,7 @@ xfs_qm_dqrele_inodes_ag(
 			ip->i_gdquot = NULL;
 		}
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		if (vnode_refd)
+		if (inode_refed)
 			IRELE(ip);
 	} while (nr_found);
 }
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/28] XFS: xfssyncd: don't call xfs_sync
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (6 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 09/28] XFS: Use struct inodes instead of vnodes to kill vn_grab Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 13/28] XFS: make SYNC_ATTR no longer use xfs_sync Dave Chinner
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Start de-multiplexing xfs_sync() by making xfs_sync_worker()
call the specific sync functions it needs. This is only a small,
unique subset of the entire xfs_sync() code so is easier to
follow.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index d4b7b21..3c31137 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -526,6 +526,11 @@ xfs_flush_device(
 	xfs_log_force(ip->i_mount, (xfs_lsn_t)0, XFS_LOG_FORCE|XFS_LOG_SYNC);
 }
 
+/*
+ * Every sync period we need to unpin all items, reclaim inodes, sync
+ * quota and write out the superblock. We might need to cover the log
+ * to indicate it is idle.
+ */
 STATIC void
 xfs_sync_worker(
 	struct xfs_mount *mp,
@@ -533,8 +538,15 @@ xfs_sync_worker(
 {
 	int		error;
 
-	if (!(mp->m_flags & XFS_MOUNT_RDONLY))
-		error = xfs_sync(mp, SYNC_FSDATA | SYNC_BDFLUSH | SYNC_ATTR);
+	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+		xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
+		xfs_finish_reclaim_all(mp, 1, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+		/* dgc: errors ignored here */
+		error = XFS_QM_DQSYNC(mp, SYNC_BDFLUSH);
+		error = xfs_sync_fsdata(mp, SYNC_BDFLUSH);
+		if (xfs_log_need_covered(mp))
+			error = xfs_commit_dummy_trans(mp, XFS_LOG_FORCE);
+	}
 	mp->m_sync_seq++;
 	wake_up(&mp->m_wait_single_sync_task);
 }
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/28] XFS: make SYNC_ATTR no longer use xfs_sync
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (7 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 12/28] XFS: xfssyncd: don't call xfs_sync Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 14/28] XFS: make SYNC_DELWRI " Dave Chinner
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Continue to de-multiplex xfs_sync be replacing all SYNC_ATTR
callers with direct calls xfs_sync_inodes(). Add an assert
into xfs_sync() to ensure we caught all the SYNC_ATTR callers.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_super.c |    3 ++-
 fs/xfs/linux-2.6/xfs_sync.c  |   23 +++++++++++------------
 fs/xfs/linux-2.6/xfs_sync.h  |    1 -
 fs/xfs/xfs_vfsops.c          |    2 +-
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 59f6209..59663f2 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -963,7 +963,8 @@ xfs_fs_put_super(
 	int			error;
 
 	xfs_syncd_stop(mp);
-	xfs_sync(mp, SYNC_ATTR | SYNC_DELWRI);
+	xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+	xfs_sync_inodes(mp, SYNC_ATTR|SYNC_DELWRI);
 
 #ifdef HAVE_DMAPI
 	if (mp->m_flags & XFS_MOUNT_DMAPI) {
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 3c31137..002ccb6 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -342,9 +342,8 @@ xfs_sync_fsdata(
  *		       periodically.  We also push the inodes and
  *		       superblock if we can lock them without sleeping
  *			and they are not pinned.
- *      SYNC_ATTR    - We need to flush the inodes.  If SYNC_BDFLUSH is not
- *		       set, then we really want to lock each inode and flush
- *		       it.
+ *      SYNC_ATTR    - We need to flush the inodes. Now handled by direct calls
+ *		       to xfs_sync_inodes().
  *      SYNC_WAIT    - All the flushes that take place in this call should
  *		       be synchronous.
  *      SYNC_DELWRI  - This tells us to push dirty pages associated with
@@ -373,6 +372,8 @@ xfs_sync(
 	int		last_error = 0;
 	uint		log_flags = XFS_LOG_FORCE;
 
+	ASSERT(!(flags & SYNC_ATTR));
+
 	/*
 	 * Get the Quota Manager to flush the dquots.
 	 *
@@ -403,20 +404,18 @@ xfs_sync(
 
 	xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
 
-	if (flags & (SYNC_ATTR|SYNC_DELWRI)) {
+	if (flags & SYNC_DELWRI) {
 		if (flags & SYNC_BDFLUSH)
 			xfs_finish_reclaim_all(mp, 1, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 		else
 			error = xfs_sync_inodes(mp, flags);
-	}
-
-	/*
-	 * Flushing out dirty data above probably generated more
-	 * log activity, so if this isn't vfs_sync() then flush
-	 * the log again.
-	 */
-	if (flags & SYNC_DELWRI)
+		/*
+		 * Flushing out dirty data above probably generated more
+		 * log activity, so if this isn't vfs_sync() then flush
+		 * the log again.
+		 */
 		xfs_log_force(mp, 0, log_flags);
+	}
 
 	if (flags & SYNC_FSDATA) {
 		error = xfs_sync_fsdata(mp, flags);
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 2954861..5316915 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -49,7 +49,6 @@ typedef struct bhv_vfs_sync_work {
  * to disk (this is the main difference between a sync and a quiesce).
  */
 #define SYNC_DATA_QUIESCE	(SYNC_DELWRI|SYNC_FSDATA|SYNC_WAIT|SYNC_IOWAIT)
-#define SYNC_INODE_QUIESCE	(SYNC_REMOUNT|SYNC_ATTR|SYNC_WAIT)
 
 int xfs_syncd_init(struct xfs_mount *mp);
 void xfs_syncd_stop(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_vfsops.c b/fs/xfs/xfs_vfsops.c
index d5396d6..c82b955 100644
--- a/fs/xfs/xfs_vfsops.c
+++ b/fs/xfs/xfs_vfsops.c
@@ -76,7 +76,7 @@ xfs_quiesce_fs(
 	 */
 	do {
 		xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
-		xfs_sync_inodes(mp, SYNC_INODE_QUIESCE);
+		xfs_sync_inodes(mp, SYNC_ATTR|SYNC_WAIT);
 		pincount = xfs_flush_buftarg(mp->m_ddev_targp, 1);
 		if (!pincount) {
 			delay(50);
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 14/28] XFS: make SYNC_DELWRI no longer use xfs_sync
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (8 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 13/28] XFS: make SYNC_ATTR no longer use xfs_sync Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 16/28] XFS: Kill xfs_sync() Dave Chinner
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Continue to de-multiplex xfs_sync be replacing all SYNC_DELWRI
callers with direct calls functions that do the work. Isolate the
data quiesce case to a function in xfs_sync.c. Isolate the
FSDATA case with explicit calls to xfs_sync_fsdata().

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_super.c |   24 ++++++------------------
 fs/xfs/linux-2.6/xfs_sync.c  |   37 ++++++++++++++++++++++++++++++++++++-
 fs/xfs/linux-2.6/xfs_sync.h  |    3 +++
 3 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 59663f2..a4ff642 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1022,7 +1022,7 @@ xfs_fs_write_super(
 	struct super_block	*sb)
 {
 	if (!(sb->s_flags & MS_RDONLY))
-		xfs_sync(XFS_M(sb), SYNC_FSDATA);
+		xfs_sync_fsdata(XFS_M(sb), 0);
 	sb->s_dirt = 0;
 }
 
@@ -1033,7 +1033,6 @@ xfs_fs_sync_super(
 {
 	struct xfs_mount	*mp = XFS_M(sb);
 	int			error;
-	int			flags;
 
 	/*
 	 * Treat a sync operation like a freeze.  This is to work
@@ -1047,20 +1046,10 @@ xfs_fs_sync_super(
 	 * dirty the Linux inode until after the transaction I/O
 	 * completes.
 	 */
-	if (wait || unlikely(sb->s_frozen == SB_FREEZE_WRITE)) {
-		/*
-		 * First stage of freeze - no more writers will make progress
-		 * now we are here, so we flush delwri and delalloc buffers
-		 * here, then wait for all I/O to complete.  Data is frozen at
-		 * that point. Metadata is not frozen, transactions can still
-		 * occur here so don't bother flushing the buftarg (i.e
-		 * SYNC_QUIESCE) because it'll just get dirty again.
-		 */
-		flags = SYNC_DATA_QUIESCE;
-	} else
-		flags = SYNC_FSDATA;
-
-	error = xfs_sync(mp, flags);
+	if (wait || unlikely(sb->s_frozen == SB_FREEZE_WRITE))
+		error = xfs_quiesce_data(mp);
+	else
+		error = xfs_sync_fsdata(mp, 0);
 	sb->s_dirt = 0;
 
 	if (unlikely(laptop_mode)) {
@@ -1178,8 +1167,7 @@ xfs_fs_remount(
 
 	/* rw -> ro */
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
-		xfs_filestream_flush(mp);
-		xfs_sync(mp, SYNC_DATA_QUIESCE);
+		xfs_quiesce_data(mp);
 		xfs_attr_quiesce(mp);
 		mp->m_flags |= XFS_MOUNT_RDONLY;
 	}
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 002ccb6..adbbdd5 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -269,7 +269,7 @@ xfs_commit_dummy_trans(
 	return 0;
 }
 
-STATIC int
+int
 xfs_sync_fsdata(
 	struct xfs_mount	*mp,
 	int			flags)
@@ -323,6 +323,41 @@ xfs_sync_fsdata(
 }
 
 /*
+ * First stage of freeze - no more writers will make progress now we are here,
+ * so we flush delwri and delalloc buffers here, then wait for all I/O to
+ * complete.  Data is frozen at that point. Metadata is not frozen,
+ * transactions can still occur here so don't bother flushing the buftarg (i.e
+ * SYNC_QUIESCE) because it'll just get dirty again.
+ */
+int
+xfs_quiesce_data(
+	struct xfs_mount	*mp)
+{
+	int error;
+
+	/* push non-blocking */
+	xfs_sync_inodes(mp, SYNC_DELWRI|SYNC_BDFLUSH);
+	XFS_QM_DQSYNC(mp, SYNC_BDFLUSH);
+	xfs_filestream_flush(mp);
+
+	/* push and block */
+	xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+	xfs_sync_inodes(mp, SYNC_DELWRI|SYNC_WAIT|SYNC_IOWAIT);
+	xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+	XFS_QM_DQSYNC(mp, SYNC_WAIT);
+
+	/* write superblock and hoover shutdown errors */
+	error = xfs_sync_fsdata(mp, 0);
+
+	/* flush devices */
+	XFS_bflush(mp->m_ddev_targp);
+	if (mp->m_rtdev_targp)
+		XFS_bflush(mp->m_rtdev_targp);
+
+	return error;
+}
+
+/*
  * xfs_sync flushes any pending I/O to file system vfsp.
  *
  * This routine is called by vfs_sync() to make sure that things make it
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 5316915..fcd4040 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -55,6 +55,9 @@ void xfs_syncd_stop(struct xfs_mount *mp);
 
 int xfs_sync(struct xfs_mount *mp, int flags);
 int xfs_sync_inodes(struct xfs_mount *mp, int flags);
+int xfs_sync_fsdata(struct xfs_mount *mp, int flags);
+
+int xfs_quiesce_data(struct xfs_mount *mp);
 
 void xfs_flush_inode(struct xfs_inode *ip);
 void xfs_flush_device(struct xfs_inode *ip);
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 16/28] XFS: Kill xfs_sync()
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (9 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 14/28] XFS: make SYNC_DELWRI " Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 17/28] XFS: Move remaining quiesce code Dave Chinner
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

There are no more callers to xfs_sync() now, so remove the
function altogther.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |  132 +++++--------------------------------------
 fs/xfs/linux-2.6/xfs_sync.h |   25 +-------
 fs/xfs/quota/xfs_qm.c       |   10 +--
 fs/xfs/xfs_iget.c           |   15 ++---
 4 files changed, 29 insertions(+), 153 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index f9b0816..c134cad 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -309,11 +309,21 @@ xfs_sync_fsdata(
 }
 
 /*
- * First stage of freeze - no more writers will make progress now we are here,
+ * When remounting a filesystem read-only or freezing the filesystem, we have
+ * two phases to execute. This first phase is syncing the data before we
+ * quiesce the filesystem, and the second is flushing all the inodes out after
+ * we've waited for all the transactions created by the first phase to
+ * complete. The second phase ensures that the inodes are written to their
+ * location on disk rather than just existing in transactions in the log. This
+ * means after a quiesce there is no log replay required to write the inodes to
+ * disk (this is the main difference between a sync and a quiesce).
+ */
+/*
+ * First stage of freeze - no writers will make progress now we are here,
  * so we flush delwri and delalloc buffers here, then wait for all I/O to
  * complete.  Data is frozen at that point. Metadata is not frozen,
- * transactions can still occur here so don't bother flushing the buftarg (i.e
- * SYNC_QUIESCE) because it'll just get dirty again.
+ * transactions can still occur here so don't bother flushing the buftarg
+ * because it'll just get dirty again.
  */
 int
 xfs_quiesce_data(
@@ -332,11 +342,10 @@ xfs_quiesce_data(
 	xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
 	XFS_QM_DQSYNC(mp, SYNC_WAIT);
 
-	/* write superblock and hoover shutdown errors */
+	/* write superblock and hoover up shutdown errors */
 	error = xfs_sync_fsdata(mp, 0);
 
-	/* flush devices */
-	XFS_bflush(mp->m_ddev_targp);
+	/* flush data-only devices */
 	if (mp->m_rtdev_targp)
 		XFS_bflush(mp->m_rtdev_targp);
 
@@ -344,117 +353,6 @@ xfs_quiesce_data(
 }
 
 /*
- * xfs_sync flushes any pending I/O to file system vfsp.
- *
- * This routine is called by vfs_sync() to make sure that things make it
- * out to disk eventually, on sync() system calls to flush out everything,
- * and when the file system is unmounted.  For the vfs_sync() case, all
- * we really need to do is sync out the log to make all of our meta-data
- * updates permanent (except for timestamps).  For calls from pflushd(),
- * dirty pages are kept moving by calling pdflush() on the inodes
- * containing them.  We also flush the inodes that we can lock without
- * sleeping and the superblock if we can lock it without sleeping from
- * vfs_sync() so that items at the tail of the log are always moving out.
- *
- * Flags:
- *      SYNC_BDFLUSH - We're being called from vfs_sync() so we don't want
- *		       to sleep if we can help it.  All we really need
- *		       to do is ensure that the log is synced at least
- *		       periodically.  We also push the inodes and
- *		       superblock if we can lock them without sleeping
- *			and they are not pinned.
- *      SYNC_ATTR    - We need to flush the inodes. Now handled by direct calls
- *		       to xfs_sync_inodes().
- *      SYNC_WAIT    - All the flushes that take place in this call should
- *		       be synchronous.
- *      SYNC_DELWRI  - This tells us to push dirty pages associated with
- *		       inodes.  SYNC_WAIT and SYNC_BDFLUSH are used to
- *		       determine if they should be flushed sync, async, or
- *		       delwri.
- *      SYNC_FSDATA  - This indicates that the caller would like to make
- *		       sure the superblock is safe on disk.  We can ensure
- *		       this by simply making sure the log gets flushed
- *		       if SYNC_BDFLUSH is set, and by actually writing it
- *		       out otherwise.
- *	SYNC_IOWAIT  - The caller wants us to wait for all data I/O to complete
- *		       before we return (including direct I/O). Forms the drain
- *		       side of the write barrier needed to safely quiesce the
- *		       filesystem.
- *
- */
-int
-xfs_sync(
-	xfs_mount_t	*mp,
-	int		flags)
-{
-	int		error;
-	int		last_error = 0;
-	uint		log_flags = XFS_LOG_FORCE;
-
-	ASSERT(!(flags & SYNC_ATTR));
-
-	/*
-	 * Get the Quota Manager to flush the dquots.
-	 *
-	 * If XFS quota support is not enabled or this filesystem
-	 * instance does not use quotas XFS_QM_DQSYNC will always
-	 * return zero.
-	 */
-	error = XFS_QM_DQSYNC(mp, flags);
-	if (error) {
-		/*
-		 * If we got an IO error, we will be shutting down.
-		 * So, there's nothing more for us to do here.
-		 */
-		ASSERT(error != EIO || XFS_FORCED_SHUTDOWN(mp));
-		if (XFS_FORCED_SHUTDOWN(mp))
-			return XFS_ERROR(error);
-	}
-
-	if (flags & SYNC_IOWAIT)
-		xfs_filestream_flush(mp);
-
-	/*
-	 * Sync out the log.  This ensures that the log is periodically
-	 * flushed even if there is not enough activity to fill it up.
-	 */
-	if (flags & SYNC_WAIT)
-		log_flags |= XFS_LOG_SYNC;
-
-	xfs_log_force(mp, (xfs_lsn_t)0, log_flags);
-
-	if (flags & SYNC_DELWRI) {
-		if (flags & SYNC_BDFLUSH)
-			xfs_finish_reclaim_all(mp, 1, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
-		else
-			error = xfs_sync_inodes(mp, flags);
-		/*
-		 * Flushing out dirty data above probably generated more
-		 * log activity, so if this isn't vfs_sync() then flush
-		 * the log again.
-		 */
-		xfs_log_force(mp, 0, log_flags);
-	}
-
-	if (flags & SYNC_FSDATA) {
-		error = xfs_sync_fsdata(mp, flags);
-		if (error)
-			last_error = error;
-	}
-
-	/*
-	 * Now check to see if the log needs a "dummy" transaction.
-	 */
-	if (!(flags & SYNC_REMOUNT) && xfs_log_need_covered(mp)) {
-		error = xfs_commit_dummy_trans(mp, log_flags);
-		if (error)
-			return error;
-	}
-
-	return XFS_ERROR(last_error);
-}
-
-/*
  * Enqueue a work item to be picked up by the vfs xfssyncd thread.
  * Doing this has two advantages:
  * - It saves on stack space, which is tight in certain situations
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 2509db0..4591dc0 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -28,31 +28,14 @@ typedef struct bhv_vfs_sync_work {
 } bhv_vfs_sync_work_t;
 
 #define SYNC_ATTR		0x0001	/* sync attributes */
-#define SYNC_DELWRI		0x0004	/* look at delayed writes */
-#define SYNC_WAIT		0x0008	/* wait for i/o to complete */
-#define SYNC_BDFLUSH		0x0010	/* BDFLUSH is calling -- don't block */
-#define SYNC_FSDATA		0x0020	/* flush fs data (e.g. superblocks) */
-#define SYNC_REFCACHE		0x0040  /* prune some of the nfs ref cache */
-#define SYNC_REMOUNT		0x0080  /* remount readonly, no dummy LRs */
-#define SYNC_IOWAIT		0x0100  /* wait for all I/O to complete */
-
-/*
- * When remounting a filesystem read-only or freezing the filesystem,
- * we have two phases to execute. This first phase is syncing the data
- * before we quiesce the fielsystem, and the second is flushing all the
- * inodes out after we've waited for all the transactions created by
- * the first phase to complete. The second phase uses SYNC_INODE_QUIESCE
- * to ensure that the inodes are written to their location on disk
- * rather than just existing in transactions in the log. This means
- * after a quiesce there is no log replay required to write the inodes
- * to disk (this is the main difference between a sync and a quiesce).
- */
-#define SYNC_DATA_QUIESCE	(SYNC_DELWRI|SYNC_FSDATA|SYNC_WAIT|SYNC_IOWAIT)
+#define SYNC_DELWRI		0x0002	/* look at delayed writes */
+#define SYNC_WAIT		0x0004	/* wait for i/o to complete */
+#define SYNC_BDFLUSH		0x0008	/* BDFLUSH is calling -- don't block */
+#define SYNC_IOWAIT		0x0010  /* wait for all I/O to complete */
 
 int xfs_syncd_init(struct xfs_mount *mp);
 void xfs_syncd_stop(struct xfs_mount *mp);
 
-int xfs_sync(struct xfs_mount *mp, int flags);
 int xfs_sync_inodes(struct xfs_mount *mp, int flags);
 int xfs_sync_fsdata(struct xfs_mount *mp, int flags);
 
diff --git a/fs/xfs/quota/xfs_qm.c b/fs/xfs/quota/xfs_qm.c
index df0ffef..46465e7 100644
--- a/fs/xfs/quota/xfs_qm.c
+++ b/fs/xfs/quota/xfs_qm.c
@@ -987,14 +987,10 @@ xfs_qm_dqdetach(
 }
 
 /*
- * This is called by VFS_SYNC and flags arg determines the caller,
- * and its motives, as done in xfs_sync.
- *
- * vfs_sync: SYNC_FSDATA|SYNC_ATTR|SYNC_BDFLUSH 0x31
- * syscall sync: SYNC_FSDATA|SYNC_ATTR|SYNC_DELWRI 0x25
- * umountroot : SYNC_WAIT | SYNC_CLOSE | SYNC_ATTR | SYNC_FSDATA
+ * This is called to sync quotas. We can be told to use non-blocking
+ * semantics by either the SYNC_BDFLUSH flag or the absence of the
+ * SYNC_WAIT flag.
  */
-
 int
 xfs_qm_sync(
 	xfs_mount_t	*mp,
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index 60554ce..7932551 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -443,14 +443,13 @@ xfs_ireclaim(xfs_inode_t *ip)
 	xfs_iextract(ip);
 
 	/*
-	 * Here we do a spurious inode lock in order to coordinate with
-	 * xfs_sync().  This is because xfs_sync() references the inodes
-	 * in the mount list without taking references on the corresponding
-	 * vnodes.  We make that OK here by ensuring that we wait until
-	 * the inode is unlocked in xfs_sync() before we go ahead and
-	 * free it.  We get both the regular lock and the io lock because
-	 * the xfs_sync() code may need to drop the regular one but will
-	 * still hold the io lock.
+	 * Here we do a spurious inode lock in order to coordinate with inode
+	 * cache radix tree lookups.  This is because the lookup can reference
+	 * the inodes in the cache without taking references.  We make that OK
+	 * here by ensuring that we wait until the inode is unlocked after the
+	 * lookup before we go ahead and free it.  We get both the ilock and
+	 * the iolock because the code may need to drop the ilock one but will
+	 * still hold the iolock.
 	 */
 	xfs_ilock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
 
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 17/28] XFS: Move remaining quiesce code.
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (10 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 16/28] XFS: Kill xfs_sync() Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 20/28] XFS: Never call mark_inode_dirty_sync() directly Dave Chinner
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

With all the other filesystem sync code it in xfs_sync.c
including the data quiesce code, it makes sense to move
the remaining quiesce code to the same place.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_super.c |    6 ++--
 fs/xfs/linux-2.6/xfs_sync.c  |   56 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/linux-2.6/xfs_sync.h  |    1 +
 fs/xfs/xfs_vfsops.c          |   57 ------------------------------------------
 fs/xfs/xfs_vfsops.h          |    1 -
 5 files changed, 60 insertions(+), 61 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 1846ea7..8c47ed1 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1158,7 +1158,7 @@ xfs_fs_remount(
 	/* rw -> ro */
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
 		xfs_quiesce_data(mp);
-		xfs_attr_quiesce(mp);
+		xfs_quiesce_attr(mp);
 		mp->m_flags |= XFS_MOUNT_RDONLY;
 	}
 
@@ -1167,7 +1167,7 @@ xfs_fs_remount(
 
 /*
  * Second stage of a freeze. The data is already frozen so we only
- * need to take care of themetadata. Once that's done write a dummy
+ * need to take care of the metadata. Once that's done write a dummy
  * record to dirty the log in case of a crash while frozen.
  */
 STATIC void
@@ -1176,7 +1176,7 @@ xfs_fs_lockfs(
 {
 	struct xfs_mount	*mp = XFS_M(sb);
 
-	xfs_attr_quiesce(mp);
+	xfs_quiesce_attr(mp);
 	xfs_fs_log_dummy(mp);
 }
 
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index c134cad..98a33a5 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -352,6 +352,62 @@ xfs_quiesce_data(
 	return error;
 }
 
+STATIC void
+xfs_quiesce_fs(
+	struct xfs_mount	*mp)
+{
+	int	count = 0, pincount;
+
+	xfs_flush_buftarg(mp->m_ddev_targp, 0);
+	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+
+	/*
+	 * This loop must run at least twice.  The first instance of the loop
+	 * will flush most meta data but that will generate more meta data
+	 * (typically directory updates).  Which then must be flushed and
+	 * logged before we can write the unmount record.
+	 */
+	do {
+		xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+		xfs_sync_inodes(mp, SYNC_ATTR|SYNC_WAIT);
+		pincount = xfs_flush_buftarg(mp->m_ddev_targp, 1);
+		if (!pincount) {
+			delay(50);
+			count++;
+		}
+	} while (count < 2);
+}
+
+/*
+ * Second stage of a quiesce. The data is already synced, now we have to take
+ * care of the metadata. New transactions are already blocked, so we need to
+ * wait for any remaining transactions to drain out before proceding.
+ */
+void
+xfs_quiesce_attr(
+	struct xfs_mount	*mp)
+{
+	int	error = 0;
+
+	/* wait for all modifications to complete */
+	while (atomic_read(&mp->m_active_trans) > 0)
+		delay(100);
+
+	/* flush inodes and push all remaining buffers out to disk */
+	xfs_quiesce_fs(mp);
+
+	ASSERT_ALWAYS(atomic_read(&mp->m_active_trans) == 0);
+
+	/* Push the superblock and write an unmount record */
+	error = xfs_log_sbcount(mp, 1);
+	if (error)
+		xfs_fs_cmn_err(CE_WARN, mp,
+				"xfs_attr_quiesce: failed to log sb changes. "
+				"Frozen image may not be consistent.");
+	xfs_log_unmount_write(mp);
+	xfs_unmountfs_writesb(mp);
+}
+
 /*
  * Enqueue a work item to be picked up by the vfs xfssyncd thread.
  * Doing this has two advantages:
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 4591dc0..3b49aa3 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -40,6 +40,7 @@ int xfs_sync_inodes(struct xfs_mount *mp, int flags);
 int xfs_sync_fsdata(struct xfs_mount *mp, int flags);
 
 int xfs_quiesce_data(struct xfs_mount *mp);
+void xfs_quiesce_attr(struct xfs_mount *mp);
 
 void xfs_flush_inode(struct xfs_inode *ip);
 void xfs_flush_device(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_vfsops.c b/fs/xfs/xfs_vfsops.c
index c82b955..72b3265 100644
--- a/fs/xfs/xfs_vfsops.c
+++ b/fs/xfs/xfs_vfsops.c
@@ -58,63 +58,6 @@
 #include "xfs_utils.h"
 #include "xfs_sync.h"
 
-
-STATIC void
-xfs_quiesce_fs(
-	xfs_mount_t		*mp)
-{
-	int			count = 0, pincount;
-
-	xfs_flush_buftarg(mp->m_ddev_targp, 0);
-	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
-
-	/*
-	 * This loop must run at least twice.  The first instance of the loop
-	 * will flush most meta data but that will generate more meta data
-	 * (typically directory updates).  Which then must be flushed and
-	 * logged before we can write the unmount record.
-	 */
-	do {
-		xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
-		xfs_sync_inodes(mp, SYNC_ATTR|SYNC_WAIT);
-		pincount = xfs_flush_buftarg(mp->m_ddev_targp, 1);
-		if (!pincount) {
-			delay(50);
-			count++;
-		}
-	} while (count < 2);
-}
-
-/*
- * Second stage of a quiesce. The data is already synced, now we have to take
- * care of the metadata. New transactions are already blocked, so we need to
- * wait for any remaining transactions to drain out before proceding.
- */
-void
-xfs_attr_quiesce(
-	xfs_mount_t	*mp)
-{
-	int	error = 0;
-
-	/* wait for all modifications to complete */
-	while (atomic_read(&mp->m_active_trans) > 0)
-		delay(100);
-
-	/* flush inodes and push all remaining buffers out to disk */
-	xfs_quiesce_fs(mp);
-
-	ASSERT_ALWAYS(atomic_read(&mp->m_active_trans) == 0);
-
-	/* Push the superblock and write an unmount record */
-	error = xfs_log_sbcount(mp, 1);
-	if (error)
-		xfs_fs_cmn_err(CE_WARN, mp,
-				"xfs_attr_quiesce: failed to log sb changes. "
-				"Frozen image may not be consistent.");
-	xfs_log_unmount_write(mp);
-	xfs_unmountfs_writesb(mp);
-}
-
 /*
  * xfs_unmount_flush implements a set of flush operation on special
  * inodes, which are needed as a separate set of operations so that
diff --git a/fs/xfs/xfs_vfsops.h b/fs/xfs/xfs_vfsops.h
index 6701d0e..6b8e0b5 100644
--- a/fs/xfs/xfs_vfsops.h
+++ b/fs/xfs/xfs_vfsops.h
@@ -10,6 +10,5 @@ struct xfs_mount_args;
 
 void xfs_do_force_shutdown(struct xfs_mount *mp, int flags, char *fname,
 		int lnnum);
-void xfs_attr_quiesce(struct xfs_mount *mp);
 
 #endif /* _XFS_VFSOPS_H */
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 20/28] XFS: Never call mark_inode_dirty_sync() directly
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (11 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 17/28] XFS: Move remaining quiesce code Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 21/28] Inode: Allow external initialisers Dave Chinner
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Once the Linux inode and the XFS inode are combined, we cannot rely
on just check if the linux inode exists as a method of determining
if it is valid or not. Hence we should always call
xfs_mark_inode_dirty_sync() instead as it does the correct checks to
determine if the liinux inode is in a valid state or not.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_aops.c  |    2 +-
 fs/xfs/linux-2.6/xfs_iops.c  |    2 +-
 fs/xfs/linux-2.6/xfs_super.c |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index f42f80a..d932579 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -191,7 +191,7 @@ xfs_setfilesize(
 		ip->i_d.di_size = isize;
 		ip->i_update_core = 1;
 		ip->i_update_size = 1;
-		mark_inode_dirty_sync(ioend->io_inode);
+		xfs_mark_inode_dirty_sync(ip);
 	}
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 91bcd97..600165f 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -128,7 +128,7 @@ xfs_ichgtime(
 	if (sync_it) {
 		SYNCHRONIZE();
 		ip->i_update_core = 1;
-		mark_inode_dirty_sync(inode);
+		xfs_mark_inode_dirty_sync(ip);
 	}
 }
 
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 0da7879..ed12c5e 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -949,7 +949,7 @@ xfs_fs_write_inode(
 	 * it dirty again so we'll try again later.
 	 */
 	if (error)
-		mark_inode_dirty_sync(inode);
+		xfs_mark_inode_dirty_sync(XFS_I(inode));
 
 	return -error;
 }
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 21/28] Inode: Allow external initialisers
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (12 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 20/28] XFS: Never call mark_inode_dirty_sync() directly Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 23/28] XFS: Combine the XFS and Linux inodes Dave Chinner
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

To allow XFS to combine the XFS and linux inodes into a single
structure, we need to drive inode lookup from the XFS inode cache,
not the generic inode cache. This means that we need initialise a
struct inode from a context outside alloc_inode() as it is no longer
used by XFS.

Factor and export the struct inode initialisation code from
alloc_inode() to inode_init_always() as a counterpart to
inode_init_once().  i.e. we have to call this init function for each
inode instantiation (always), as opposed inode_init_once() which is
only called on slab object instantiation (once).

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/inode.c         |  146 +++++++++++++++++++++++++++++-----------------------
 include/linux/fs.h |    1 +
 2 files changed, 82 insertions(+), 65 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index b6726f6..1ec2b5b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -108,83 +108,99 @@ static void wake_up_inode(struct inode *inode)
 	wake_up_bit(&inode->i_state, __I_LOCK);
 }
 
-static struct inode *alloc_inode(struct super_block *sb)
+/**
+ * inode_init_always - perform inode structure intialisation
+ * @sb		- superblock inode belongs to.
+ * @inode	- inode to initialise
+ *
+ * These are initializations that need to be done on every inode
+ * allocation as the fields are not initialised by slab allocation.
+ */
+struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
 {
 	static const struct address_space_operations empty_aops;
 	static struct inode_operations empty_iops;
 	static const struct file_operations empty_fops;
-	struct inode *inode;
 
-	if (sb->s_op->alloc_inode)
-		inode = sb->s_op->alloc_inode(sb);
-	else
-		inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL);
-
-	if (inode) {
-		struct address_space * const mapping = &inode->i_data;
-
-		inode->i_sb = sb;
-		inode->i_blkbits = sb->s_blocksize_bits;
-		inode->i_flags = 0;
-		atomic_set(&inode->i_count, 1);
-		inode->i_op = &empty_iops;
-		inode->i_fop = &empty_fops;
-		inode->i_nlink = 1;
-		atomic_set(&inode->i_writecount, 0);
-		inode->i_size = 0;
-		inode->i_blocks = 0;
-		inode->i_bytes = 0;
-		inode->i_generation = 0;
+	struct address_space * const mapping = &inode->i_data;
+
+	inode->i_sb = sb;
+	inode->i_blkbits = sb->s_blocksize_bits;
+	inode->i_flags = 0;
+	atomic_set(&inode->i_count, 1);
+	inode->i_op = &empty_iops;
+	inode->i_fop = &empty_fops;
+	inode->i_nlink = 1;
+	atomic_set(&inode->i_writecount, 0);
+	inode->i_size = 0;
+	inode->i_blocks = 0;
+	inode->i_bytes = 0;
+	inode->i_generation = 0;
 #ifdef CONFIG_QUOTA
-		memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
+	memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
 #endif
-		inode->i_pipe = NULL;
-		inode->i_bdev = NULL;
-		inode->i_cdev = NULL;
-		inode->i_rdev = 0;
-		inode->dirtied_when = 0;
-		if (security_inode_alloc(inode)) {
-			if (inode->i_sb->s_op->destroy_inode)
-				inode->i_sb->s_op->destroy_inode(inode);
-			else
-				kmem_cache_free(inode_cachep, (inode));
-			return NULL;
-		}
+	inode->i_pipe = NULL;
+	inode->i_bdev = NULL;
+	inode->i_cdev = NULL;
+	inode->i_rdev = 0;
+	inode->dirtied_when = 0;
+	if (security_inode_alloc(inode)) {
+		if (inode->i_sb->s_op->destroy_inode)
+			inode->i_sb->s_op->destroy_inode(inode);
+		else
+			kmem_cache_free(inode_cachep, (inode));
+		return NULL;
+	}
 
-		spin_lock_init(&inode->i_lock);
-		lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
+	spin_lock_init(&inode->i_lock);
+	lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
 
-		mutex_init(&inode->i_mutex);
-		lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
+	mutex_init(&inode->i_mutex);
+	lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
 
-		init_rwsem(&inode->i_alloc_sem);
-		lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
+	init_rwsem(&inode->i_alloc_sem);
+	lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
 
-		mapping->a_ops = &empty_aops;
- 		mapping->host = inode;
-		mapping->flags = 0;
-		mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
-		mapping->assoc_mapping = NULL;
-		mapping->backing_dev_info = &default_backing_dev_info;
+	mapping->a_ops = &empty_aops;
+	mapping->host = inode;
+	mapping->flags = 0;
+	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
+	mapping->assoc_mapping = NULL;
+	mapping->backing_dev_info = &default_backing_dev_info;
 
-		/*
-		 * If the block_device provides a backing_dev_info for client
-		 * inodes then use that.  Otherwise the inode share the bdev's
-		 * backing_dev_info.
-		 */
-		if (sb->s_bdev) {
-			struct backing_dev_info *bdi;
+	/*
+	 * If the block_device provides a backing_dev_info for client
+	 * inodes then use that.  Otherwise the inode share the bdev's
+	 * backing_dev_info.
+	 */
+	if (sb->s_bdev) {
+		struct backing_dev_info *bdi;
 
-			bdi = sb->s_bdev->bd_inode_backing_dev_info;
-			if (!bdi)
-				bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
-			mapping->backing_dev_info = bdi;
-		}
-		inode->i_private = NULL;
-		inode->i_mapping = mapping;
+		bdi = sb->s_bdev->bd_inode_backing_dev_info;
+		if (!bdi)
+			bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
+		mapping->backing_dev_info = bdi;
 	}
+	inode->i_private = NULL;
+	inode->i_mapping = mapping;
+
 	return inode;
 }
+EXPORT_SYMBOL(inode_init_always);
+
+static struct inode *alloc_inode(struct super_block *sb)
+{
+	struct inode *inode;
+
+	if (sb->s_op->alloc_inode)
+		inode = sb->s_op->alloc_inode(sb);
+	else
+		inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+
+	if (inode)
+		return inode_init_always(sb, inode);
+	return NULL;
+}
 
 void destroy_inode(struct inode *inode) 
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 580b513..ce55983 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1811,6 +1811,7 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int origin);
 
 extern loff_t vfs_llseek(struct file *file, loff_t offset, int origin);
 
+extern struct inode * inode_init_always(struct super_block *, struct inode *);
 extern void inode_init_once(struct inode *);
 extern void iput(struct inode *);
 extern struct inode * igrab(struct inode *);
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 23/28] XFS: Combine the XFS and Linux inodes.
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (13 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 21/28] Inode: Allow external initialisers Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 24/28] XFS: move inode reclaim functions to xfs_sync.c Dave Chinner
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

To avoid issues with different lifecycles of XFS and Linux inodes,
embedd the linux inode inside the XFS inode. This means that the
linux inode has the same lifecycle as the XFS inode, even when it
has been released by the OS. XFS inodes don't live much longer than
this (a short stint in reclaim at most), so there isn't significant
memory usage penalties here.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_iops.c  |   17 +++-
 fs/xfs/linux-2.6/xfs_super.c |   48 +++++-------
 fs/xfs/linux-2.6/xfs_vnode.c |    6 +-
 fs/xfs/xfs_iget.c            |  168 +++++++++---------------------------------
 fs/xfs/xfs_inode.c           |   43 ++++++++---
 fs/xfs/xfs_inode.h           |    9 ++-
 fs/xfs/xfs_vnodeops.c        |   13 +---
 7 files changed, 110 insertions(+), 194 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 600165f..5f8ecc7 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -64,14 +64,14 @@ xfs_synchronize_atime(
 {
 	struct inode	*inode = VFS_I(ip);
 
-	if (inode) {
+	if (!(inode->i_state & I_CLEAR)) {
 		ip->i_d.di_atime.t_sec = (__int32_t)inode->i_atime.tv_sec;
 		ip->i_d.di_atime.t_nsec = (__int32_t)inode->i_atime.tv_nsec;
 	}
 }
 
 /*
- * If the linux inode exists, mark it dirty.
+ * If the linux inode is valid, mark it dirty.
  * Used when commiting a dirty inode into a transaction so that
  * the inode will get written back by the linux code
  */
@@ -81,7 +81,7 @@ xfs_mark_inode_dirty_sync(
 {
 	struct inode	*inode = VFS_I(ip);
 
-	if (inode)
+	if (!(inode->i_state & (I_WILL_FREE|I_FREEING|I_CLEAR)))
 		mark_inode_dirty_sync(inode);
 }
 
@@ -766,12 +766,21 @@ xfs_diflags_to_iflags(
  * When reading existing inodes from disk this is called directly
  * from xfs_iget, when creating a new inode it is called from
  * xfs_ialloc after setting up the inode.
+ *
+ * We are always called with an uninitialised linux inode here.
+ * We need to initialise the necessary fields and take a reference
+ * on it.
  */
 void
 xfs_setup_inode(
 	struct xfs_inode	*ip)
 {
-	struct inode		*inode = ip->i_vnode;
+	struct inode		*inode = &ip->i_vnode;
+
+	inode->i_ino = ip->i_ino;
+	inode->i_state = I_NEW|I_LOCK;
+	inode_add_to_lists(ip->i_mount->m_super, inode);
+	ASSERT(atomic_read(&inode->i_count) == 1);
 
 	inode->i_mode	= ip->i_d.di_mode;
 	inode->i_nlink	= ip->i_d.di_nlink;
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index ed12c5e..21ef6a4 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -71,7 +71,6 @@
 
 static struct quotactl_ops xfs_quotactl_operations;
 static struct super_operations xfs_super_operations;
-static kmem_zone_t *xfs_vnode_zone;
 static kmem_zone_t *xfs_ioend_zone;
 mempool_t *xfs_ioend_pool;
 
@@ -866,29 +865,25 @@ xfsaild_stop(
 }
 
 
-
+/* Catch misguided souls that try to use this interface on XFS */
 STATIC struct inode *
 xfs_fs_alloc_inode(
 	struct super_block	*sb)
 {
-	return kmem_zone_alloc(xfs_vnode_zone, KM_SLEEP);
+	BUG();
 }
 
+/*
+ * we need to provide an empty inode free function to prevent
+ * the generic code from trying to free ouuur combined inode.
+ */
 STATIC void
 xfs_fs_destroy_inode(
-	struct inode		*inode)
-{
-	kmem_zone_free(xfs_vnode_zone, inode);
-}
-
-STATIC void
-xfs_fs_inode_init_once(
-	void			*vnode)
+	struct inode	*inode)
 {
-	inode_init_once((struct inode *)vnode);
+	return;
 }
 
-
 /*
  * Slab object creation initialisation for the XFS inode.
  * This covers only the idempotent fields in the XFS inode;
@@ -897,13 +892,18 @@ xfs_fs_inode_init_once(
  * fields in the xfs inode that left in the initialise state
  * when freeing the inode.
  */
-void
-xfs_inode_init_once(
+STATIC void
+xfs_fs_inode_init_once(
 	void			*inode)
 {
 	struct xfs_inode	*ip = inode;
 
 	memset(ip, 0, sizeof(struct xfs_inode));
+
+	/* vfs inode */
+	inode_init_once(VFS_I(ip));
+
+	/* xfs inode */
 	atomic_set(&ip->i_iocount, 0);
 	atomic_set(&ip->i_pincount, 0);
 	spin_lock_init(&ip->i_flags_lock);
@@ -976,8 +976,6 @@ xfs_fs_clear_inode(
 		if (xfs_reclaim(ip))
 			panic("%s: cannot reclaim 0x%p\n", __func__, inode);
 	}
-
-	ASSERT(XFS_I(inode) == NULL);
 }
 
 STATIC void
@@ -1805,16 +1803,10 @@ xfs_free_trace_bufs(void)
 STATIC int __init
 xfs_init_zones(void)
 {
-	xfs_vnode_zone = kmem_zone_init_flags(sizeof(struct inode), "xfs_vnode",
-					KM_ZONE_HWALIGN | KM_ZONE_RECLAIM |
-					KM_ZONE_SPREAD,
-					xfs_fs_inode_init_once);
-	if (!xfs_vnode_zone)
-		goto out;
 
 	xfs_ioend_zone = kmem_zone_init(sizeof(xfs_ioend_t), "xfs_ioend");
 	if (!xfs_ioend_zone)
-		goto out_destroy_vnode_zone;
+		goto out;
 
 	xfs_ioend_pool = mempool_create_slab_pool(4 * MAX_BUF_PER_PAGE,
 						  xfs_ioend_zone);
@@ -1830,6 +1822,7 @@ xfs_init_zones(void)
 						"xfs_bmap_free_item");
 	if (!xfs_bmap_free_item_zone)
 		goto out_destroy_log_ticket_zone;
+
 	xfs_btree_cur_zone = kmem_zone_init(sizeof(xfs_btree_cur_t),
 						"xfs_btree_cur");
 	if (!xfs_btree_cur_zone)
@@ -1877,8 +1870,8 @@ xfs_init_zones(void)
 
 	xfs_inode_zone =
 		kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
-					KM_ZONE_HWALIGN | KM_ZONE_RECLAIM |
-					KM_ZONE_SPREAD, xfs_inode_init_once);
+			KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD,
+			xfs_fs_inode_init_once);
 	if (!xfs_inode_zone)
 		goto out_destroy_efi_zone;
 
@@ -1926,8 +1919,6 @@ xfs_init_zones(void)
 	mempool_destroy(xfs_ioend_pool);
  out_destroy_ioend_zone:
 	kmem_zone_destroy(xfs_ioend_zone);
- out_destroy_vnode_zone:
-	kmem_zone_destroy(xfs_vnode_zone);
  out:
 	return -ENOMEM;
 }
@@ -1952,7 +1943,6 @@ xfs_destroy_zones(void)
 	kmem_zone_destroy(xfs_log_ticket_zone);
 	mempool_destroy(xfs_ioend_pool);
 	kmem_zone_destroy(xfs_ioend_zone);
-	kmem_zone_destroy(xfs_vnode_zone);
 
 }
 
diff --git a/fs/xfs/linux-2.6/xfs_vnode.c b/fs/xfs/linux-2.6/xfs_vnode.c
index dceb6db..e5ec36e 100644
--- a/fs/xfs/linux-2.6/xfs_vnode.c
+++ b/fs/xfs/linux-2.6/xfs_vnode.c
@@ -90,11 +90,7 @@ vn_ioerror(
  */
 static inline int xfs_icount(struct xfs_inode *ip)
 {
-	struct inode *inode = VFS_I(ip);
-
-	if (!inode)
-		return atomic_read(&inode->i_count);
-	return -1;
+	return atomic_read(&VFS_I(ip)->i_count);
 }
 
 #define KTRACE_ENTER(ip, vk, s, line, ra)			\
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index 12fe37e..f1a30dd 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -44,77 +44,65 @@
  */
 static int
 xfs_iget_cache_hit(
-	struct inode		*inode,
 	struct xfs_perag	*pag,
 	struct xfs_inode	*ip,
 	int			flags,
 	int			lock_flags) __releases(pag->pag_ici_lock)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct inode		*old_inode;
 	int			error = 0;
 
 	/*
 	 * If INEW is set this inode is being set up
+	 * If IRECLAIM is set this inode is being torn down
 	 * Pause and try again.
 	 */
-	if (xfs_iflags_test(ip, XFS_INEW)) {
+	if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
 		error = EAGAIN;
 		XFS_STATS_INC(xs_ig_frecycle);
 		goto out_error;
 	}
 
-	old_inode = ip->i_vnode;
-	if (old_inode == NULL) {
+	/* If IRECLAIMABLE is set, we've torn down the vfs inode part */
+	if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+
 		/*
-		 * If IRECLAIM is set this inode is
-		 * on its way out of the system,
-		 * we need to pause and try again.
+		 * If lookup is racing with unlink, then we should return an
+		 * error immediately so we don't remove it from the reclaim
+		 * list and potentially leak the inode.
 		 */
-		if (xfs_iflags_test(ip, XFS_IRECLAIM)) {
-			error = EAGAIN;
-			XFS_STATS_INC(xs_ig_frecycle);
+
+		if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
+			error = ENOENT;
 			goto out_error;
 		}
-		ASSERT(xfs_iflags_test(ip, XFS_IRECLAIMABLE));
+
+		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
 
 		/*
-		 * If lookup is racing with unlink, then we
-		 * should return an error immediately so we
-		 * don't remove it from the reclaim list and
-		 * potentially leak the inode.
+		 * We need to re-initialise the VFS inode as it has been
+		 * 'freed' by the VFS. Do this here so we can deal with
+		 * errors cleanly, then tag it so it can be set up correctly
+		 * later.
 		 */
-		if ((ip->i_d.di_mode == 0) &&
-		    !(flags & XFS_IGET_CREATE)) {
-			error = ENOENT;
+		if (!inode_init_always(mp->m_super, VFS_I(ip))) {
+			error = ENOMEM;
 			goto out_error;
 		}
-		xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
-
+		xfs_iflags_set(ip, XFS_INEW);
 		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
 		read_unlock(&pag->pag_ici_lock);
 
 		XFS_MOUNT_ILOCK(mp);
 		list_del_init(&ip->i_reclaim);
 		XFS_MOUNT_IUNLOCK(mp);
-
-	} else if (inode != old_inode) {
-		/* The inode is being torn down, pause and
-		 * try again.
-		 */
-		if (old_inode->i_state & (I_FREEING | I_CLEAR)) {
-			error = EAGAIN;
-			XFS_STATS_INC(xs_ig_frecycle);
-			goto out_error;
-		}
-/* Chances are the other vnode (the one in the inode) is being torn
-* down right now, and we landed on top of it. Question is, what do
-* we do? Unhook the old inode and hook up the new one?
-*/
-		cmn_err(CE_PANIC,
-	"xfs_iget_core: ambiguous vns: vp/0x%p, invp/0x%p",
-				old_inode, inode);
+	} else if (!igrab(VFS_I(ip))) {
+		/* If the VFS inode is being torn down, pause and try again. */
+		error = EAGAIN;
+		XFS_STATS_INC(xs_ig_frecycle);
+		goto out_error;
 	} else {
+		/* we've got a live one */
 		read_unlock(&pag->pag_ici_lock);
 	}
 
@@ -214,11 +202,11 @@ out_destroy:
 /*
  * Look up an inode by number in the given file system.
  * The inode is looked up in the cache held in each AG.
- * If the inode is found in the cache, attach it to the provided
- * vnode.
+ * If the inode is found in the cache, initialise the vfs inode
+ * if necessary.
  *
  * If it is not in core, read it in from the file system's device,
- * add it to the cache and attach the provided vnode.
+ * add it to the cache and initialise the vfs inode.
  *
  * The inode is locked according to the value of the lock_flags parameter.
  * This flag parameter indicates how and if the inode's IO lock and inode lock
@@ -235,9 +223,8 @@ out_destroy:
  * bno -- the block number starting the buffer containing the inode,
  *	  if known (as by bulkstat), else 0.
  */
-STATIC int
-xfs_iget_core(
-	struct inode	*inode,
+int
+xfs_iget(
 	xfs_mount_t	*mp,
 	xfs_trans_t	*tp,
 	xfs_ino_t	ino,
@@ -268,7 +255,7 @@ again:
 	ip = radix_tree_lookup(&pag->pag_ici_root, agino);
 
 	if (ip) {
-		error = xfs_iget_cache_hit(inode, pag, ip, flags, lock_flags);
+		error = xfs_iget_cache_hit(pag, ip, flags, lock_flags);
 		if (error)
 			goto out_error_or_again;
 	} else {
@@ -282,23 +269,17 @@ again:
 	}
 	xfs_put_perag(mp, pag);
 
-	ASSERT(ip->i_df.if_ext_max ==
-	       XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t));
-
 	xfs_iflags_set(ip, XFS_IMODIFIED);
 	*ipp = ip;
 
-	/*
-	 * Set up the Linux with the Linux inode.
-	 */
-	ip->i_vnode = inode;
-	inode->i_private = ip;
-
+	ASSERT(ip->i_df.if_ext_max ==
+	       XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t));
 	/*
 	 * If we have a real type for an on-disk inode, we can set ops(&unlock)
 	 * now.	 If it's a new inode being created, xfs_ialloc will handle it.
 	 */
-	if (ip->i_d.di_mode != 0)
+	//if (ip->i_d.di_mode != 0 || xfs_iflags_test(ip, XFS_INEW))
+	if (xfs_iflags_test(ip, XFS_INEW) && ip->i_d.di_mode != 0)
 		xfs_setup_inode(ip);
 	return 0;
 
@@ -313,75 +294,6 @@ out_error_or_again:
 
 
 /*
- * The 'normal' internal xfs_iget, if needed it will
- * 'allocate', or 'get', the vnode.
- */
-int
-xfs_iget(
-	xfs_mount_t	*mp,
-	xfs_trans_t	*tp,
-	xfs_ino_t	ino,
-	uint		flags,
-	uint		lock_flags,
-	xfs_inode_t	**ipp,
-	xfs_daddr_t	bno)
-{
-	struct inode	*inode;
-	xfs_inode_t	*ip;
-	int		error;
-
-	XFS_STATS_INC(xs_ig_attempts);
-
-retry:
-	inode = iget_locked(mp->m_super, ino);
-	if (!inode)
-		/* If we got no inode we are out of memory */
-		return ENOMEM;
-
-	if (inode->i_state & I_NEW) {
-		XFS_STATS_INC(vn_active);
-		XFS_STATS_INC(vn_alloc);
-
-		error = xfs_iget_core(inode, mp, tp, ino, flags,
-				lock_flags, ipp, bno);
-		if (error) {
-			make_bad_inode(inode);
-			if (inode->i_state & I_NEW)
-				unlock_new_inode(inode);
-			iput(inode);
-		}
-		return error;
-	}
-
-	/*
-	 * If the inode is not fully constructed due to
-	 * filehandle mismatches wait for the inode to go
-	 * away and try again.
-	 *
-	 * iget_locked will call __wait_on_freeing_inode
-	 * to wait for the inode to go away.
-	 */
-	if (is_bad_inode(inode)) {
-		iput(inode);
-		delay(1);
-		goto retry;
-	}
-
-	ip = XFS_I(inode);
-	if (!ip) {
-		iput(inode);
-		delay(1);
-		goto retry;
-	}
-
-	if (lock_flags != 0)
-		xfs_ilock(ip, lock_flags);
-	XFS_STATS_INC(xs_ig_found);
-	*ipp = ip;
-	return 0;
-}
-
-/*
  * Look for the inode corresponding to the given ino in the hash table.
  * If it is there and its i_transp pointer matches tp, return it.
  * Otherwise, return NULL.
@@ -481,14 +393,6 @@ xfs_ireclaim(xfs_inode_t *ip)
 	XFS_QM_DQDETACH(ip->i_mount, ip);
 
 	/*
-	 * Pull our behavior descriptor from the vnode chain.
-	 */
-	if (ip->i_vnode) {
-		ip->i_vnode->i_private = NULL;
-		ip->i_vnode = NULL;
-	}
-
-	/*
 	 * Free all memory associated with the inode.
 	 */
 	xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 91a003d..30f3d40 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -812,6 +812,16 @@ xfs_inode_alloc(
 	ASSERT(list_empty(&ip->i_reclaim));
 	ASSERT(completion_done(&ip->i_flush));
 
+	/*
+	 * initialise the VFS inode here to get failures
+	 * out of the way early.
+	 */
+	if (!inode_init_always(mp->m_super, VFS_I(ip))) {
+		kmem_zone_free(xfs_inode_zone, ip);
+		return NULL;
+	}
+
+	/* initialise the xfs inode */
 	ip->i_ino = ino;
 	ip->i_mount = mp;
 	ip->i_blkno = 0;
@@ -1086,6 +1096,7 @@ xfs_ialloc(
 	uint		flags;
 	int		error;
 	timespec_t	tv;
+	int		filestreams = 0;
 
 	/*
 	 * Call the space management code to pick
@@ -1093,9 +1104,8 @@ xfs_ialloc(
 	 */
 	error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc,
 			    ialloc_context, call_again, &ino);
-	if (error != 0) {
+	if (error)
 		return error;
-	}
 	if (*call_again || ino == NULLFSINO) {
 		*ipp = NULL;
 		return 0;
@@ -1109,9 +1119,8 @@ xfs_ialloc(
 	 */
 	error = xfs_trans_iget(tp->t_mountp, tp, ino,
 				XFS_IGET_CREATE, XFS_ILOCK_EXCL, &ip);
-	if (error != 0) {
+	if (error)
 		return error;
-	}
 	ASSERT(ip != NULL);
 
 	ip->i_d.di_mode = (__uint16_t)mode;
@@ -1192,13 +1201,12 @@ xfs_ialloc(
 		flags |= XFS_ILOG_DEV;
 		break;
 	case S_IFREG:
-		if (pip && xfs_inode_is_filestream(pip)) {
-			error = xfs_filestream_associate(pip, ip);
-			if (error < 0)
-				return -error;
-			if (!error)
-				xfs_iflags_set(ip, XFS_IFILESTREAM);
-		}
+		/*
+		 * we can't set up filestreams until after the VFS inode
+		 * is set up properly.
+		 */
+		if (pip && xfs_inode_is_filestream(pip))
+			filestreams = 1;
 		/* fall through */
 	case S_IFDIR:
 		if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) {
@@ -1264,6 +1272,15 @@ xfs_ialloc(
 	/* now that we have an i_mode we can setup inode ops and unlock */
 	xfs_setup_inode(ip);
 
+	/* now we have set up the vfs inode we can associate the filestream */
+	if (filestreams) {
+		error = xfs_filestream_associate(pip, ip);
+		if (error < 0)
+			return -error;
+		if (!error)
+			xfs_iflags_set(ip, XFS_IFILESTREAM);
+	}
+
 	*ipp = ip;
 	return 0;
 }
@@ -2654,6 +2671,10 @@ xfs_idestroy_fork(
  * It must free the inode itself and any buffers allocated for
  * if_extents/if_data and if_broot.  It must also free the lock
  * associated with the inode.
+ *
+ * Note: because we don't initialise everything on reallocation out
+ * of the zone, we must ensure we nullify everything correctly before
+ * freeing the structure.
  */
 void
 xfs_idestroy(
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 813c9b0..4350d94 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -195,7 +195,6 @@ typedef struct xfs_inode {
 	/* Inode linking and identification information. */
 	struct xfs_mount	*i_mount;	/* fs mount struct ptr */
 	struct list_head	i_reclaim;	/* reclaim list */
-	struct inode		*i_vnode;	/* vnode backpointer */
 	struct xfs_dquot	*i_udquot;	/* user dquot */
 	struct xfs_dquot	*i_gdquot;	/* group dquot */
 
@@ -229,6 +228,10 @@ typedef struct xfs_inode {
 	xfs_fsize_t		i_size;		/* in-memory size */
 	xfs_fsize_t		i_new_size;	/* size when write completes */
 	atomic_t		i_iocount;	/* outstanding I/O count */
+
+	/* VFS inode */
+	struct inode		i_vnode;	/* embedded VFS inode */
+
 	/* Trace buffers per inode. */
 #ifdef XFS_INODE_TRACE
 	struct ktrace		*i_trace;	/* general inode trace */
@@ -256,13 +259,13 @@ typedef struct xfs_inode {
 /* Convert from vfs inode to xfs inode */
 static inline struct xfs_inode *XFS_I(struct inode *inode)
 {
-	return (struct xfs_inode *)inode->i_private;
+	return container_of(inode, struct xfs_inode, i_vnode);
 }
 
 /* convert from xfs inode to vfs inode */
 static inline struct inode *VFS_I(struct xfs_inode *ip)
 {
-	return (struct inode *)ip->i_vnode;
+	return (struct inode *)&ip->i_vnode;
 }
 
 /*
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index d6eec3f..c0ce803 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2800,6 +2800,7 @@ xfs_reclaim(
 	if (!ip->i_update_core && (ip->i_itemp == NULL)) {
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_iflock(ip);
+		xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 		return xfs_finish_reclaim(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC);
 	} else {
 		xfs_mount_t	*mp = ip->i_mount;
@@ -2808,8 +2809,6 @@ xfs_reclaim(
 		XFS_MOUNT_ILOCK(mp);
 		spin_lock(&ip->i_flags_lock);
 		__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
-		VFS_I(ip)->i_private = NULL;
-		ip->i_vnode = NULL;
 		spin_unlock(&ip->i_flags_lock);
 		list_add_tail(&ip->i_reclaim, &mp->m_del_inodes);
 		XFS_MOUNT_IUNLOCK(mp);
@@ -2824,10 +2823,6 @@ xfs_finish_reclaim(
 	int		sync_mode)
 {
 	xfs_perag_t	*pag = xfs_get_perag(ip->i_mount, ip->i_ino);
-	struct inode	*vp = VFS_I(ip);
-
-	if (vp && VN_BAD(vp))
-		goto reclaim;
 
 	/* The hash lock here protects a thread in xfs_iget_core from
 	 * racing with us on linking the inode back with a vnode.
@@ -2837,7 +2832,7 @@ xfs_finish_reclaim(
 	write_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
 	if (__xfs_iflags_test(ip, XFS_IRECLAIM) ||
-	    (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) && vp == NULL)) {
+	    !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
 		spin_unlock(&ip->i_flags_lock);
 		write_unlock(&pag->pag_ici_lock);
 		if (locked) {
@@ -2871,15 +2866,13 @@ xfs_finish_reclaim(
 	 * In the case of a forced shutdown we rely on xfs_iflush() to
 	 * wait for the inode to be unpinned before returning an error.
 	 */
-	if (xfs_iflush(ip, sync_mode) == 0) {
+	if (!VN_BAD(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
 		/* synchronize with xfs_iflush_done */
 		xfs_iflock(ip);
 		xfs_ifunlock(ip);
 	}
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
- reclaim:
 	xfs_ireclaim(ip);
 	return 0;
 }
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 24/28] XFS: move inode reclaim functions to xfs_sync.c
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (14 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 23/28] XFS: Combine the XFS and Linux inodes Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 25/28] XFS: mark inodes for reclaim via a tag in the inode radix tree Dave Chinner
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Background inode reclaim is run by the xfssyncd. Move the
reclaim worker functions to be close to the sync code as
the are very similar in structure and are both run from
the same background thread.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   91 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/linux-2.6/xfs_sync.h |    3 +
 fs/xfs/xfs_inode.h          |    2 -
 fs/xfs/xfs_vnodeops.c       |   90 ------------------------------------------
 4 files changed, 94 insertions(+), 92 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 98a33a5..afb0569 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -579,3 +579,94 @@ xfs_syncd_stop(
 	kthread_stop(mp->m_sync_task);
 }
 
+int
+xfs_finish_reclaim(
+	xfs_inode_t	*ip,
+	int		locked,
+	int		sync_mode)
+{
+	xfs_perag_t	*pag = xfs_get_perag(ip->i_mount, ip->i_ino);
+
+	/* The hash lock here protects a thread in xfs_iget_core from
+	 * racing with us on linking the inode back with a vnode.
+	 * Once we have the XFS_IRECLAIM flag set it will not touch
+	 * us.
+	 */
+	write_lock(&pag->pag_ici_lock);
+	spin_lock(&ip->i_flags_lock);
+	if (__xfs_iflags_test(ip, XFS_IRECLAIM) ||
+	    !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+		spin_unlock(&ip->i_flags_lock);
+		write_unlock(&pag->pag_ici_lock);
+		if (locked) {
+			xfs_ifunlock(ip);
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		}
+		return 1;
+	}
+	__xfs_iflags_set(ip, XFS_IRECLAIM);
+	spin_unlock(&ip->i_flags_lock);
+	write_unlock(&pag->pag_ici_lock);
+	xfs_put_perag(ip->i_mount, pag);
+
+	/*
+	 * If the inode is still dirty, then flush it out.  If the inode
+	 * is not in the AIL, then it will be OK to flush it delwri as
+	 * long as xfs_iflush() does not keep any references to the inode.
+	 * We leave that decision up to xfs_iflush() since it has the
+	 * knowledge of whether it's OK to simply do a delwri flush of
+	 * the inode or whether we need to wait until the inode is
+	 * pulled from the AIL.
+	 * We get the flush lock regardless, though, just to make sure
+	 * we don't free it while it is being flushed.
+	 */
+	if (!locked) {
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		xfs_iflock(ip);
+	}
+
+	/*
+	 * In the case of a forced shutdown we rely on xfs_iflush() to
+	 * wait for the inode to be unpinned before returning an error.
+	 */
+	if (!VN_BAD(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
+		/* synchronize with xfs_iflush_done */
+		xfs_iflock(ip);
+		xfs_ifunlock(ip);
+	}
+
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_ireclaim(ip);
+	return 0;
+}
+
+int
+xfs_finish_reclaim_all(
+	xfs_mount_t	*mp,
+	int		 noblock,
+	int		mode)
+{
+	xfs_inode_t	*ip, *n;
+
+restart:
+	XFS_MOUNT_ILOCK(mp);
+	list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
+		if (noblock) {
+			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
+				continue;
+			if (xfs_ipincount(ip) ||
+			    !xfs_iflock_nowait(ip)) {
+				xfs_iunlock(ip, XFS_ILOCK_EXCL);
+				continue;
+			}
+		}
+		XFS_MOUNT_IUNLOCK(mp);
+		if (xfs_finish_reclaim(ip, noblock, mode))
+			delay(1);
+		goto restart;
+	}
+	XFS_MOUNT_IUNLOCK(mp);
+	return 0;
+}
+
+
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 3b49aa3..23117a1 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -45,4 +45,7 @@ void xfs_quiesce_attr(struct xfs_mount *mp);
 void xfs_flush_inode(struct xfs_inode *ip);
 void xfs_flush_device(struct xfs_inode *ip);
 
+int xfs_finish_reclaim(struct xfs_inode *ip, int locked, int sync_mode);
+int xfs_finish_reclaim_all(struct xfs_mount *mp, int noblock, int mode);
+
 #endif
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 4350d94..b81ebca 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -478,8 +478,6 @@ int		xfs_isilocked(xfs_inode_t *, uint);
 uint		xfs_ilock_map_shared(xfs_inode_t *);
 void		xfs_iunlock_map_shared(xfs_inode_t *, uint);
 void		xfs_ireclaim(xfs_inode_t *);
-int		xfs_finish_reclaim(xfs_inode_t *, int, int);
-int		xfs_finish_reclaim_all(struct xfs_mount *, int, int);
 
 /*
  * xfs_inode.c prototypes.
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index c0ce803..dc4d807 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2816,96 +2816,6 @@ xfs_reclaim(
 	return 0;
 }
 
-int
-xfs_finish_reclaim(
-	xfs_inode_t	*ip,
-	int		locked,
-	int		sync_mode)
-{
-	xfs_perag_t	*pag = xfs_get_perag(ip->i_mount, ip->i_ino);
-
-	/* The hash lock here protects a thread in xfs_iget_core from
-	 * racing with us on linking the inode back with a vnode.
-	 * Once we have the XFS_IRECLAIM flag set it will not touch
-	 * us.
-	 */
-	write_lock(&pag->pag_ici_lock);
-	spin_lock(&ip->i_flags_lock);
-	if (__xfs_iflags_test(ip, XFS_IRECLAIM) ||
-	    !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
-		spin_unlock(&ip->i_flags_lock);
-		write_unlock(&pag->pag_ici_lock);
-		if (locked) {
-			xfs_ifunlock(ip);
-			xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		}
-		return 1;
-	}
-	__xfs_iflags_set(ip, XFS_IRECLAIM);
-	spin_unlock(&ip->i_flags_lock);
-	write_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(ip->i_mount, pag);
-
-	/*
-	 * If the inode is still dirty, then flush it out.  If the inode
-	 * is not in the AIL, then it will be OK to flush it delwri as
-	 * long as xfs_iflush() does not keep any references to the inode.
-	 * We leave that decision up to xfs_iflush() since it has the
-	 * knowledge of whether it's OK to simply do a delwri flush of
-	 * the inode or whether we need to wait until the inode is
-	 * pulled from the AIL.
-	 * We get the flush lock regardless, though, just to make sure
-	 * we don't free it while it is being flushed.
-	 */
-	if (!locked) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		xfs_iflock(ip);
-	}
-
-	/*
-	 * In the case of a forced shutdown we rely on xfs_iflush() to
-	 * wait for the inode to be unpinned before returning an error.
-	 */
-	if (!VN_BAD(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
-		/* synchronize with xfs_iflush_done */
-		xfs_iflock(ip);
-		xfs_ifunlock(ip);
-	}
-
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	xfs_ireclaim(ip);
-	return 0;
-}
-
-int
-xfs_finish_reclaim_all(
-	xfs_mount_t	*mp,
-	int		 noblock,
-	int		mode)
-{
-	xfs_inode_t	*ip, *n;
-
-restart:
-	XFS_MOUNT_ILOCK(mp);
-	list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
-		if (noblock) {
-			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
-				continue;
-			if (xfs_ipincount(ip) ||
-			    !xfs_iflock_nowait(ip)) {
-				xfs_iunlock(ip, XFS_ILOCK_EXCL);
-				continue;
-			}
-		}
-		XFS_MOUNT_IUNLOCK(mp);
-		if (xfs_finish_reclaim(ip, noblock, mode))
-			delay(1);
-		goto restart;
-	}
-	XFS_MOUNT_IUNLOCK(mp);
-	return 0;
-}
-
 /*
  * xfs_alloc_file_space()
  *      This routine allocates disk space for the given file.
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 25/28] XFS: mark inodes for reclaim via a tag in the inode radix tree
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (15 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 24/28] XFS: move inode reclaim functions to xfs_sync.c Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 13:16 ` [PATCH 26/28] XFS: rename inode reclaim functions Dave Chinner
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

Prepare for removing the deleted inode list by marking inodes
for reclaim in the inode radix trees so that we can use the
radix trees to find reclaimable inodes.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   41 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/linux-2.6/xfs_sync.h |    4 ++++
 fs/xfs/xfs_ag.h             |    5 +++++
 fs/xfs/xfs_iget.c           |    3 +++
 fs/xfs/xfs_vnodeops.c       |    1 +
 5 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index afb0569..ebe2932 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -640,6 +640,47 @@ xfs_finish_reclaim(
 	return 0;
 }
 
+void
+xfs_inode_set_reclaim_tag(
+	xfs_inode_t	*ip)
+{
+	xfs_mount_t	*mp = ip->i_mount;
+	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
+
+	read_lock(&pag->pag_ici_lock);
+	spin_lock(&ip->i_flags_lock);
+	radix_tree_tag_set(&pag->pag_ici_root,
+			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+	spin_unlock(&ip->i_flags_lock);
+	read_unlock(&pag->pag_ici_lock);
+	xfs_put_perag(mp, pag);
+}
+
+void
+__xfs_inode_clear_reclaim_tag(
+	xfs_mount_t	*mp,
+	xfs_perag_t	*pag,
+	xfs_inode_t	*ip)
+{
+	radix_tree_tag_clear(&pag->pag_ici_root,
+			XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+}
+
+void
+xfs_inode_clear_reclaim_tag(
+	xfs_inode_t	*ip)
+{
+	xfs_mount_t	*mp = ip->i_mount;
+	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
+
+	read_lock(&pag->pag_ici_lock);
+	spin_lock(&ip->i_flags_lock);
+	__xfs_inode_clear_reclaim_tag(mp, pag, ip);
+	spin_unlock(&ip->i_flags_lock);
+	read_unlock(&pag->pag_ici_lock);
+	xfs_put_perag(mp, pag);
+}
+
 int
 xfs_finish_reclaim_all(
 	xfs_mount_t	*mp,
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 23117a1..fcaf004 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -48,4 +48,8 @@ void xfs_flush_device(struct xfs_inode *ip);
 int xfs_finish_reclaim(struct xfs_inode *ip, int locked, int sync_mode);
 int xfs_finish_reclaim_all(struct xfs_mount *mp, int noblock, int mode);
 
+void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
+				struct xfs_inode *ip);
 #endif
diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index 61b292a..426e438 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -203,6 +203,11 @@ typedef struct xfs_perag
 	struct radix_tree_root pag_ici_root;	/* incore inode cache root */
 } xfs_perag_t;
 
+/*
+ * tags for inode radix tree
+ */
+#define XFS_ICI_RECLAIM_TAG	0	/* inode is to be reclaimed */
+
 #define	XFS_AG_MAXLEVELS(mp)		((mp)->m_ag_maxlevels)
 #define	XFS_MIN_FREELIST_RAW(bl,cl,mp)	\
 	(MIN(bl + 1, XFS_AG_MAXLEVELS(mp)) + MIN(cl + 1, XFS_AG_MAXLEVELS(mp)))
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index f1a30dd..f06c970 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -91,6 +91,9 @@ xfs_iget_cache_hit(
 		}
 		xfs_iflags_set(ip, XFS_INEW);
 		xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
+
+		/* clear the radix tree reclaim flag as well. */
+		__xfs_inode_clear_reclaim_tag(mp, pag, ip);
 		read_unlock(&pag->pag_ici_lock);
 
 		XFS_MOUNT_ILOCK(mp);
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index dc4d807..1f56ea1 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2812,6 +2812,7 @@ xfs_reclaim(
 		spin_unlock(&ip->i_flags_lock);
 		list_add_tail(&ip->i_reclaim, &mp->m_del_inodes);
 		XFS_MOUNT_IUNLOCK(mp);
+		xfs_inode_set_reclaim_tag(ip);
 	}
 	return 0;
 }
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 26/28] XFS: rename inode reclaim functions
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (16 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 25/28] XFS: mark inodes for reclaim via a tag in the inode radix tree Dave Chinner
@ 2008-08-19 13:16 ` Dave Chinner
  2008-08-19 23:11 ` [PATCH 0/28] XFS: sync and reclaim rework Mark Goodwin
  2008-08-19 23:59 ` Dave Chinner
  19 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 13:16 UTC (permalink / raw)
  To: xfs

The function names xfs_finish_reclaim and xfs_finish_reclaim_all
are not very descriptive of what they are reclaiming. Rename to
xfs_reclaim_inode[s] to match the xfs_sync_inodes() function.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   10 +++++-----
 fs/xfs/linux-2.6/xfs_sync.h |    4 ++--
 fs/xfs/xfs_mount.c          |    2 +-
 fs/xfs/xfs_vnodeops.c       |    2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index ebe2932..3141c25 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -359,7 +359,7 @@ xfs_quiesce_fs(
 	int	count = 0, pincount;
 
 	xfs_flush_buftarg(mp->m_ddev_targp, 0);
-	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+	xfs_reclaim_inodes(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 
 	/*
 	 * This loop must run at least twice.  The first instance of the loop
@@ -501,7 +501,7 @@ xfs_sync_worker(
 
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
 		xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
-		xfs_finish_reclaim_all(mp, 1, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+		xfs_reclaim_inodes(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 		/* dgc: errors ignored here */
 		error = XFS_QM_DQSYNC(mp, SYNC_BDFLUSH);
 		error = xfs_sync_fsdata(mp, SYNC_BDFLUSH);
@@ -580,7 +580,7 @@ xfs_syncd_stop(
 }
 
 int
-xfs_finish_reclaim(
+xfs_reclaim_inode(
 	xfs_inode_t	*ip,
 	int		locked,
 	int		sync_mode)
@@ -682,7 +682,7 @@ xfs_inode_clear_reclaim_tag(
 }
 
 int
-xfs_finish_reclaim_all(
+xfs_reclaim_inodes(
 	xfs_mount_t	*mp,
 	int		 noblock,
 	int		mode)
@@ -702,7 +702,7 @@ restart:
 			}
 		}
 		XFS_MOUNT_IUNLOCK(mp);
-		if (xfs_finish_reclaim(ip, noblock, mode))
+		if (xfs_reclaim_inode(ip, noblock, mode))
 			delay(1);
 		goto restart;
 	}
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index fcaf004..5f6de1e 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -45,8 +45,8 @@ void xfs_quiesce_attr(struct xfs_mount *mp);
 void xfs_flush_inode(struct xfs_inode *ip);
 void xfs_flush_device(struct xfs_inode *ip);
 
-int xfs_finish_reclaim(struct xfs_inode *ip, int locked, int sync_mode);
-int xfs_finish_reclaim_all(struct xfs_mount *mp, int noblock, int mode);
+int xfs_reclaim_inode(struct xfs_inode *ip, int locked, int sync_mode);
+int xfs_reclaim_inodes(struct xfs_mount *mp, int noblock, int mode);
 
 void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
 void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 8372b3d..25698ee 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1241,7 +1241,7 @@ xfs_unmountfs(
 	 * need to force the log first.
 	 */
 	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
-	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_ASYNC);
+	xfs_reclaim_inodes(mp, 0, XFS_IFLUSH_ASYNC);
 
 	XFS_QM_DQPURGEALL(mp, XFS_QMOPT_QUOTALL | XFS_QMOPT_UMOUNTING);
 
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 1f56ea1..47b3e68 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2801,7 +2801,7 @@ xfs_reclaim(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_iflock(ip);
 		xfs_iflags_set(ip, XFS_IRECLAIMABLE);
-		return xfs_finish_reclaim(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC);
+		return xfs_reclaim_inode(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC);
 	} else {
 		xfs_mount_t	*mp = ip->i_mount;
 
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/28] XFS: sync and reclaim rework
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (17 preceding siblings ...)
  2008-08-19 13:16 ` [PATCH 26/28] XFS: rename inode reclaim functions Dave Chinner
@ 2008-08-19 23:11 ` Mark Goodwin
  2008-08-19 23:39   ` Dave Chinner
  2008-08-19 23:59 ` Dave Chinner
  19 siblings, 1 reply; 23+ messages in thread
From: Mark Goodwin @ 2008-08-19 23:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs-oss

Thanks Dave - this is queued behind the btree factoring series;
both need to be QA'd and stress/perf tested independently, which
leads me to ask: how much QA has the sync/reclaim rework received?
(and could you make use of the machine Christoph has been using,
since you're in non-overlapping TZs?)

Also, if we were to change the inode / block offset direct mapping
to an indirect method (e.g. inode32+), would this grossly affect
the ascending inode number traversal optimization? If so, could that
mechanism be made conditional?

Thanks
-- Mark


Dave Chinner wrote:
> Multiple patch sets, all in one patch bomb against a current
> git tree. This includes all outstanding patches I have previously
> sent that are not committed plus a bunch more...
> 
> ---
> 
> XFS: replace the mount inode list with radix tree traversals V4
> 
> The list of all inodes on a mount is superfluous. We can traverse
> all inodes now by walking the per-AG inode radix trees without
> needing a separate list. This enables us to remove a bunch of
> complex list traversal code and remove another two pointers from
> the xfs_inode.
> 
> Also, by replacing the sync traversal with an ascending inode
> number traversal, we will issue better inode I/O patterns for
> writeback triggered by xfssyncd or unmount.
> 
> Before we make this change, move all the relevant sync code
> into it's own file in the linux-2.6/ directory. This aggregates
> VFS specific sync interfacing in the one file and will allow
> all the subsequent change history to be associated with this
> file so it is easy to find in future.
> 
> Version 4:
> o revert xfs_syncsub -> xfs_sync change in xfs_quiesce_fs and
>   rediff patch series
> 
> ---
> 
> XFS: clean up sync code
> 
> xfs_sync and xfs_syncsub are multiplexed interfaces that
> shares relatively little code between callers. because it is
> a multiplexed interface, it's hard to tell what is executed
> in each context it is called.
> 
> Factor out the sync code and explicitly call the sync functions
> needed rather than the multiplexed interfaces. Once this is
> done, we can remove xfs_syncsub and xfs_sync altogether.
> 
> ---
> 
> RFC: Combine Linux and XFS inodes V2
> 
> XFS currently has to deal with two separate inode lifecycles
> which makes for complexity in inode lookups and reclaim. We
> also have the problem of not always having a linux inode around
> when it might be useful to have it.
> 
> To avoid these lifecycle problems, this series embedѕ the linux
> inode inside the struct xfs_inode and changes the way we reference
> to two inodes. We can no longer check for a null linux inode -
> instead we have to check to see if it is valid or not by checking
> either the linux inode or xfs inode state flags. While this means
> that inodes waiting for reclaim use more memory, this is not the
> commonn state for inodes and the will soon be completely freed so
> the additional memeory use in this state is only a temporary issue.
> 
> This combining of the inodes simplifies the inode and reclaim logic,
> making it possible to do reclaim via radix tree tags (an upcoming
> patch series) and to be able to use RCU locking on the radix trees.
> The fact that we don't have a simple mechanism to determine the
> reclaim state of the inode makes RCU locking very complex, and this
> complexity is removed by having a combined inode structure.
> 
> This patch series also changes the way XFS caches inodes. It no
> longer uses the linux inode cache as the primary lookup cache -
> instead we rely solely on the XFS inode caches. This avoids the
> inode_lock in lookups that hit the cache - we should get much
> better parallelism out of inode lookup than we currently do now.
> 
> The patch series also makes use of the slab 'init once' feature
> for the XFS inodes. This means we only need to do partial
> initialisation of the xfs (and embedded linux inode) whenever
> we allocate a new inode.
> 
> In future, we should also be able to cull duplicate fields out of
> the xfs and linux inodes reducing the overall memory usage of
> the active inode cache. This provides scope for continuing to
> reduce the memory footprint of the XFS inode cache.
> 
> Version 2
> o reorder and rework as a result of review comments.
> 
> ---
> 
> XFS: Track reclaimable inodes in inode cache.
> 
> Move the tracking of reclaimable inodes
> into the inode radix trees. This currently does not replace
> the reclaim flags in the inode, rather it allows traversal of
> all reclaimable inodes by walking the per-AG inode radix trees without needing
> a separate list. This enables us to remove a list and a lock to
> remove a point of serialisation during inode reclaim.
> 
> 
> Like the matching sync code, this also allows reclaim of inodes
> in ascending inode numbers which substantially improves I/O
> patterns during reclaim driven inode flushing.
> 
> ---
> 
> Combined diffstat:
> 
>  fs/inode.c                     |  205 ++++++----
>  fs/xfs/Makefile                |    1
>  fs/xfs/linux-2.6/xfs_aops.c    |    2
>  fs/xfs/linux-2.6/xfs_iops.c    |   19
>  fs/xfs/linux-2.6/xfs_super.c   |  265 +++----------
>  fs/xfs/linux-2.6/xfs_super.h   |    3
>  fs/xfs/linux-2.6/xfs_sync.c    |  780 +++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/linux-2.6/xfs_sync.h    |   55 ++
>  fs/xfs/linux-2.6/xfs_vfs.h     |   31 -
>  fs/xfs/linux-2.6/xfs_vnode.c   |    6
>  fs/xfs/linux-2.6/xfs_vnode.h   |    5
>  fs/xfs/quota/xfs_qm.c          |   10
>  fs/xfs/quota/xfs_qm_syscalls.c |  137 +++----
>  fs/xfs/xfs_ag.h                |    5
>  fs/xfs/xfs_iget.c              |  473 +++++++++---------------
>  fs/xfs/xfs_inode.c             |  140 ++++---
>  fs/xfs/xfs_inode.h             |   22 -
>  fs/xfs/xfs_itable.c            |   14
>  fs/xfs/xfs_mount.c             |    8
>  fs/xfs/xfs_mount.h             |   12
>  fs/xfs/xfs_vfsops.c            |  617 --------------------------------
>  fs/xfs/xfs_vfsops.h            |    2
>  fs/xfs/xfs_vnodeops.c          |  118 ------
>  include/linux/fs.h             |    2
>  24 files changed, 1391 insertions(+), 1541 deletions(-)
> 
> 
> 

-- 

  Mark Goodwin                                  markgw@sgi.com
  Engineering Manager for XFS and PCP    Phone: +61-3-99631937
  SGI Australian Software Group           Cell: +61-4-18969583
-------------------------------------------------------------

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/28] XFS: sync and reclaim rework
  2008-08-19 23:11 ` [PATCH 0/28] XFS: sync and reclaim rework Mark Goodwin
@ 2008-08-19 23:39   ` Dave Chinner
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 23:39 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: xfs-oss

On Wed, Aug 20, 2008 at 09:11:49AM +1000, Mark Goodwin wrote:
> Thanks Dave - this is queued behind the btree factoring series;
> both need to be QA'd and stress/perf tested independently, which
> leads me to ask: how much QA has the sync/reclaim rework received?

It passes xfsqa. I've only been caring about correctness and getting
the code into a decent shape right now. Performance should not
change significantly for most workloads. The workload that
might benefit the most (cached lookups on a SMP NFS server) I
can't test myself anyway...

> (and could you make use of the machine Christoph has been using,
> since you're in non-overlapping TZs?)
> Also, if we were to change the inode / block offset direct mapping
> to an indirect method (e.g. inode32+), would this grossly affect
> the ascending inode number traversal optimization? If so, could that
> mechanism be made conditional?

Depends on the indirection mechanism. with inode32+, it was making
the AG number indirect. The per-ag radix trees only index the offset
of the inode into the AG, so such a change would not adversely
impact the new algorithm.

If we move to full location independence for inode numbers (not that
this will ever happen) then we'd fall back to the current situation
of effectively random order reclaim.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/28] XFS: sync and reclaim rework
  2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
                   ` (18 preceding siblings ...)
  2008-08-19 23:11 ` [PATCH 0/28] XFS: sync and reclaim rework Mark Goodwin
@ 2008-08-19 23:59 ` Dave Chinner
  2008-08-22  1:44   ` Christoph Hellwig
  19 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2008-08-19 23:59 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/plain, Size: 451 bytes --]

On Tue, Aug 19, 2008 at 11:16:16PM +1000, Dave Chinner wrote:
> 
> Multiple patch sets, all in one patch bomb against a current
> git tree. This includes all outstanding patches I have previously
> sent that are not committed plus a bunch more...

A bunch of ppl (including me) have not seen the entire patchset
come through the list yet [*], so I've tarred up the patch set and
attached it below.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #2: xfs-sync-rework.tar.gz --]
[-- Type: application/octet-stream, Size: 50289 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/28] XFS: sync and reclaim rework
  2008-08-19 23:59 ` Dave Chinner
@ 2008-08-22  1:44   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2008-08-22  1:44 UTC (permalink / raw)
  To: xfs

On Wed, Aug 20, 2008 at 09:59:20AM +1000, Dave Chinner wrote:
> On Tue, Aug 19, 2008 at 11:16:16PM +1000, Dave Chinner wrote:
> > 
> > Multiple patch sets, all in one patch bomb against a current
> > git tree. This includes all outstanding patches I have previously
> > sent that are not committed plus a bunch more...
> 
> A bunch of ppl (including me) have not seen the entire patchset
> come through the list yet [*], so I've tarred up the patch set and
> attached it below.

Thanks, the patches looks good to me modulo the consmetic comments
pointed out on IRC.  They also survive xfsqs ontop of my btree patches.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-08-22  1:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-19 13:16 [PATCH 0/28] XFS: sync and reclaim rework Dave Chinner
2008-08-19 13:16 ` [PATCH 01/28] XFS: remove i_gen from incore inode Dave Chinner
2008-08-19 13:16 ` [PATCH 02/28] XFS: move sync code to its own file Dave Chinner
2008-08-19 13:16 ` [PATCH 04/28] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V3 Dave Chinner
2008-08-19 13:16 ` [PATCH 05/28] XFS: Use the inode tree for finding dirty inodes V3 Dave Chinner
2008-08-19 13:16 ` [PATCH 06/28] XFS: Traverse inode trees when releasing dquots V3 Dave Chinner
2008-08-19 13:16 ` [PATCH 08/28] XFS: split out two helpers from xfs_syncsub Dave Chinner
2008-08-19 13:16 ` [PATCH 09/28] XFS: Use struct inodes instead of vnodes to kill vn_grab Dave Chinner
2008-08-19 13:16 ` [PATCH 12/28] XFS: xfssyncd: don't call xfs_sync Dave Chinner
2008-08-19 13:16 ` [PATCH 13/28] XFS: make SYNC_ATTR no longer use xfs_sync Dave Chinner
2008-08-19 13:16 ` [PATCH 14/28] XFS: make SYNC_DELWRI " Dave Chinner
2008-08-19 13:16 ` [PATCH 16/28] XFS: Kill xfs_sync() Dave Chinner
2008-08-19 13:16 ` [PATCH 17/28] XFS: Move remaining quiesce code Dave Chinner
2008-08-19 13:16 ` [PATCH 20/28] XFS: Never call mark_inode_dirty_sync() directly Dave Chinner
2008-08-19 13:16 ` [PATCH 21/28] Inode: Allow external initialisers Dave Chinner
2008-08-19 13:16 ` [PATCH 23/28] XFS: Combine the XFS and Linux inodes Dave Chinner
2008-08-19 13:16 ` [PATCH 24/28] XFS: move inode reclaim functions to xfs_sync.c Dave Chinner
2008-08-19 13:16 ` [PATCH 25/28] XFS: mark inodes for reclaim via a tag in the inode radix tree Dave Chinner
2008-08-19 13:16 ` [PATCH 26/28] XFS: rename inode reclaim functions Dave Chinner
2008-08-19 23:11 ` [PATCH 0/28] XFS: sync and reclaim rework Mark Goodwin
2008-08-19 23:39   ` Dave Chinner
2008-08-19 23:59 ` Dave Chinner
2008-08-22  1:44   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox