public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 01/11] xfs: remove xfs_itruncate_data
  2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
@ 2011-12-08 15:57 ` Christoph Hellwig
  2011-12-13 21:23   ` Dave Chinner
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-08 15:57 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-xfs_itruncate_data --]
[-- Type: text/plain, Size: 14594 bytes --]

This wrapper isn't overly useful, not to say rather confusing.

Around the call to xfs_itruncate_extents it does:

 - add tracing
 - add a few asserts in debug builds
 - conditionally update the inode size in two places
 - log the inode

Both the tracing and the inode logging can be moved to xfs_itruncate_extents
as they are useful for the attribute fork as well - in fact the attr code
already does an equivalent xfs_trans_log_inode call just after calling
xfs_itruncate_extents.  The conditional size updates are a mess, and there
was no reason to do them in two places anyway, as the first one was
conditional on the inode having extents - but without extents we
xfs_itruncate_extents would be a no-op and the placement wouldn't matter
anyway.  Instead move the size assignments and the asserts that make sense
to the callers that want it.

As a side effect of this clean up xfs_setattr_size by introducing variables
for the old and new inode size, and moving the size updates into a common
place.

Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_attr.c        |    4 -
 fs/xfs/xfs_inode.c       |  124 +++--------------------------------------------
 fs/xfs/xfs_inode.h       |    2 
 fs/xfs/xfs_iops.c        |   47 +++++++++++------
 fs/xfs/xfs_qm_syscalls.c |    9 +++
 fs/xfs/xfs_trace.h       |    4 -
 fs/xfs/xfs_vnodeops.c    |   17 +++++-
 7 files changed, 65 insertions(+), 142 deletions(-)

Index: xfs/fs/xfs/xfs_attr.c
===================================================================
--- xfs.orig/fs/xfs/xfs_attr.c	2011-11-30 12:58:07.820044461 +0100
+++ xfs/fs/xfs/xfs_attr.c	2011-11-30 12:58:51.519807719 +0100
@@ -827,10 +827,6 @@ xfs_attr_inactive(xfs_inode_t *dp)
 	if (error)
 		goto out;
 
-	/*
-	 * Commit the last in the sequence of transactions.
-	 */
-	xfs_trans_log_inode(trans, dp, XFS_ILOG_CORE);
 	error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
 
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-11-30 12:58:07.830044408 +0100
+++ xfs/fs/xfs/xfs_inode.c	2011-11-30 12:58:51.519807719 +0100
@@ -1166,52 +1166,6 @@ xfs_ialloc(
 }
 
 /*
- * Check to make sure that there are no blocks allocated to the
- * file beyond the size of the file.  We don't check this for
- * files with fixed size extents or real time extents, but we
- * at least do it for regular files.
- */
-#ifdef DEBUG
-STATIC void
-xfs_isize_check(
-	struct xfs_inode	*ip,
-	xfs_fsize_t		isize)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		map_first;
-	int			nimaps;
-	xfs_bmbt_irec_t		imaps[2];
-	int			error;
-
-	if (!S_ISREG(ip->i_d.di_mode))
-		return;
-
-	if (XFS_IS_REALTIME_INODE(ip))
-		return;
-
-	if (ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE)
-		return;
-
-	nimaps = 2;
-	map_first = XFS_B_TO_FSB(mp, (xfs_ufsize_t)isize);
-	/*
-	 * The filesystem could be shutting down, so bmapi may return
-	 * an error.
-	 */
-	error = xfs_bmapi_read(ip, map_first,
-			 (XFS_B_TO_FSB(mp,
-			       (xfs_ufsize_t)XFS_MAXIOFFSET(mp)) - map_first),
-			 imaps, &nimaps, XFS_BMAPI_ENTIRE);
-	if (error)
-		return;
-	ASSERT(nimaps == 1);
-	ASSERT(imaps[0].br_startblock == HOLESTARTBLOCK);
-}
-#else	/* DEBUG */
-#define xfs_isize_check(ip, isize)
-#endif	/* DEBUG */
-
-/*
  * Free up the underlying blocks past new_size.  The new size must be smaller
  * than the current size.  This routine can be used both for the attribute and
  * data fork, and does not modify the inode size, which is left to the caller.
@@ -1258,6 +1212,8 @@ xfs_itruncate_extents(
 	ASSERT(ip->i_itemp->ili_lock_flags == 0);
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 
+	trace_xfs_itruncate_extents_start(ip, new_size);
+
 	/*
 	 * Since it is possible for space to become allocated beyond
 	 * the end of the file (in a crash where the space is allocated
@@ -1325,6 +1281,14 @@ xfs_itruncate_extents(
 			goto out;
 	}
 
+	/*
+	 * Always re-log the inode so that our permanent transaction can keep
+	 * on rolling it forward in the log.
+	 */
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	trace_xfs_itruncate_extents_end(ip, new_size);
+
 out:
 	*tpp = tp;
 	return error;
@@ -1338,74 +1302,6 @@ out_bmap_cancel:
 	goto out;
 }
 
-int
-xfs_itruncate_data(
-	struct xfs_trans	**tpp,
-	struct xfs_inode	*ip,
-	xfs_fsize_t		new_size)
-{
-	int			error;
-
-	trace_xfs_itruncate_data_start(ip, new_size);
-
-	/*
-	 * The first thing we do is set the size to new_size permanently on
-	 * disk.  This way we don't have to worry about anyone ever being able
-	 * to look at the data being freed even in the face of a crash.
-	 * What we're getting around here is the case where we free a block, it
-	 * is allocated to another file, it is written to, and then we crash.
-	 * If the new data gets written to the file but the log buffers
-	 * containing the free and reallocation don't, then we'd end up with
-	 * garbage in the blocks being freed.  As long as we make the new_size
-	 * permanent before actually freeing any blocks it doesn't matter if
-	 * they get written to.
-	 */
-	if (ip->i_d.di_nextents > 0) {
-		/*
-		 * If we are not changing the file size then do not update
-		 * the on-disk file size - we may be called from
-		 * xfs_inactive_free_eofblocks().  If we update the on-disk
-		 * file size and then the system crashes before the contents
-		 * of the file are flushed to disk then the files may be
-		 * full of holes (ie NULL files bug).
-		 */
-		if (ip->i_size != new_size) {
-			ip->i_d.di_size = new_size;
-			ip->i_size = new_size;
-			xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
-		}
-	}
-
-	error = xfs_itruncate_extents(tpp, ip, XFS_DATA_FORK, new_size);
-	if (error)
-		return error;
-
-	/*
-	 * If we are not changing the file size then do not update the on-disk
-	 * file size - we may be called from xfs_inactive_free_eofblocks().
-	 * If we update the on-disk file size and then the system crashes
-	 * before the contents of the file are flushed to disk then the files
-	 * may be full of holes (ie NULL files bug).
-	 */
-	xfs_isize_check(ip, new_size);
-	if (ip->i_size != new_size) {
-		ip->i_d.di_size = new_size;
-		ip->i_size = new_size;
-	}
-
-	ASSERT(new_size != 0 || ip->i_delayed_blks == 0);
-	ASSERT(new_size != 0 || ip->i_d.di_nextents == 0);
-
-	/*
-	 * Always re-log the inode so that our permanent transaction can keep
-	 * on rolling it forward in the log.
-	 */
-	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
-
-	trace_xfs_itruncate_data_end(ip, new_size);
-	return 0;
-}
-
 /*
  * This is called when the inode's link count goes to 0.
  * We place the on-disk inode on a list in the AGI.  It
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:58:08.843372251 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-11-30 12:58:51.523141034 +0100
@@ -491,8 +491,6 @@ int		xfs_ifree(struct xfs_trans *, xfs_i
 			   struct xfs_bmap_free *);
 int		xfs_itruncate_extents(struct xfs_trans **, struct xfs_inode *,
 				      int, xfs_fsize_t);
-int		xfs_itruncate_data(struct xfs_trans **, struct xfs_inode *,
-				   xfs_fsize_t);
 int		xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
 
 void		xfs_iext_realloc(xfs_inode_t *, int, int);
Index: xfs/fs/xfs/xfs_iops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iops.c	2011-11-30 12:58:07.856710929 +0100
+++ xfs/fs/xfs/xfs_iops.c	2011-11-30 12:58:51.523141034 +0100
@@ -750,6 +750,7 @@ xfs_setattr_size(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct inode		*inode = VFS_I(ip);
 	int			mask = iattr->ia_valid;
+	xfs_off_t		oldsize, newsize;
 	struct xfs_trans	*tp;
 	int			error;
 	uint			lock_flags;
@@ -777,11 +778,13 @@ xfs_setattr_size(
 		lock_flags |= XFS_IOLOCK_EXCL;
 	xfs_ilock(ip, lock_flags);
 
+	oldsize = ip->i_size;
+	newsize = iattr->ia_size;
+
 	/*
 	 * Short circuit the truncate case for zero length files.
 	 */
-	if (iattr->ia_size == 0 &&
-	    ip->i_size == 0 && ip->i_d.di_nextents == 0) {
+	if (newsize == 0 && oldsize == 0 && ip->i_d.di_nextents == 0) {
 		if (!(mask & (ATTR_CTIME|ATTR_MTIME)))
 			goto out_unlock;
 
@@ -807,14 +810,14 @@ xfs_setattr_size(
 	 * the inode to the transaction, because the inode cannot be unlocked
 	 * once it is a part of the transaction.
 	 */
-	if (iattr->ia_size > ip->i_size) {
+	if (newsize > oldsize) {
 		/*
 		 * Do the first part of growing a file: zero any data in the
 		 * last block that is beyond the old EOF.  We need to do this
 		 * before the inode is joined to the transaction to modify
 		 * i_size.
 		 */
-		error = xfs_zero_eof(ip, iattr->ia_size, ip->i_size);
+		error = xfs_zero_eof(ip, newsize, oldsize);
 		if (error)
 			goto out_unlock;
 	}
@@ -833,8 +836,8 @@ xfs_setattr_size(
 	 * here and prevents waiting for other data not within the range we
 	 * care about here.
 	 */
-	if (ip->i_size != ip->i_d.di_size && iattr->ia_size > ip->i_d.di_size) {
-		error = xfs_flush_pages(ip, ip->i_d.di_size, iattr->ia_size, 0,
+	if (oldsize != ip->i_d.di_size && newsize > ip->i_d.di_size) {
+		error = xfs_flush_pages(ip, ip->i_d.di_size, newsize, 0,
 					FI_NONE);
 		if (error)
 			goto out_unlock;
@@ -845,8 +848,7 @@ xfs_setattr_size(
 	 */
 	inode_dio_wait(inode);
 
-	error = -block_truncate_page(inode->i_mapping, iattr->ia_size,
-				     xfs_get_blocks);
+	error = -block_truncate_page(inode->i_mapping, newsize, xfs_get_blocks);
 	if (error)
 		goto out_unlock;
 
@@ -857,7 +859,7 @@ xfs_setattr_size(
 	if (error)
 		goto out_trans_cancel;
 
-	truncate_setsize(inode, iattr->ia_size);
+	truncate_setsize(inode, newsize);
 
 	commit_flags = XFS_TRANS_RELEASE_LOG_RES;
 	lock_flags |= XFS_ILOCK_EXCL;
@@ -876,19 +878,30 @@ xfs_setattr_size(
 	 * these flags set.  For all other operations the VFS set these flags
 	 * explicitly if it wants a timestamp update.
 	 */
-	if (iattr->ia_size != ip->i_size &&
-	    (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
+	if (newsize != oldsize && (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
 		iattr->ia_ctime = iattr->ia_mtime =
 			current_fs_time(inode->i_sb);
 		mask |= ATTR_CTIME | ATTR_MTIME;
 	}
 
-	if (iattr->ia_size > ip->i_size) {
-		ip->i_d.di_size = iattr->ia_size;
-		ip->i_size = iattr->ia_size;
-	} else if (iattr->ia_size <= ip->i_size ||
-		   (iattr->ia_size == 0 && ip->i_d.di_nextents)) {
-		error = xfs_itruncate_data(&tp, ip, iattr->ia_size);
+	/*
+	 * The first thing we do is set the size to new_size permanently on
+	 * disk.  This way we don't have to worry about anyone ever being able
+	 * to look at the data being freed even in the face of a crash.
+	 * What we're getting around here is the case where we free a block, it
+	 * is allocated to another file, it is written to, and then we crash.
+	 * If the new data gets written to the file but the log buffers
+	 * containing the free and reallocation don't, then we'd end up with
+	 * garbage in the blocks being freed.  As long as we make the new size
+	 * permanent before actually freeing any blocks it doesn't matter if
+	 * they get written to.
+	 */
+	ip->i_d.di_size = newsize;
+	ip->i_size = newsize;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	if (newsize <= oldsize) {
+		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);
 		if (error)
 			goto out_trans_abort;
 
Index: xfs/fs/xfs/xfs_qm_syscalls.c
===================================================================
--- xfs.orig/fs/xfs/xfs_qm_syscalls.c	2011-11-30 12:58:07.866710876 +0100
+++ xfs/fs/xfs/xfs_qm_syscalls.c	2011-11-30 12:58:51.523141034 +0100
@@ -31,6 +31,7 @@
 #include "xfs_mount.h"
 #include "xfs_bmap_btree.h"
 #include "xfs_inode.h"
+#include "xfs_inode_item.h"
 #include "xfs_itable.h"
 #include "xfs_bmap.h"
 #include "xfs_rtalloc.h"
@@ -263,13 +264,19 @@ xfs_qm_scall_trunc_qfile(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_itruncate_data(&tp, ip, 0);
+	ip->i_d.di_size = 0;
+	ip->i_size = 0;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
 	if (error) {
 		xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
 				     XFS_TRANS_ABORT);
 		goto out_unlock;
 	}
 
+	ASSERT(ip->i_d.di_nextents == 0);
+
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
 
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h	2011-11-30 12:58:07.876710822 +0100
+++ xfs/fs/xfs/xfs_trace.h	2011-11-30 12:58:51.523141034 +0100
@@ -1090,8 +1090,8 @@ DECLARE_EVENT_CLASS(xfs_itrunc_class,
 DEFINE_EVENT(xfs_itrunc_class, name, \
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size), \
 	TP_ARGS(ip, new_size))
-DEFINE_ITRUNC_EVENT(xfs_itruncate_data_start);
-DEFINE_ITRUNC_EVENT(xfs_itruncate_data_end);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_start);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_end);
 
 TRACE_EVENT(xfs_pagecache_inval,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t start, xfs_off_t finish),
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-11-30 12:58:07.893377397 +0100
+++ xfs/fs/xfs/xfs_vnodeops.c	2011-11-30 12:58:51.526474350 +0100
@@ -226,7 +226,14 @@ xfs_free_eofblocks(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip, 0);
 
-		error = xfs_itruncate_data(&tp, ip, ip->i_size);
+		/*
+		 * Do not update the on-disk file size.  If we update the
+		 * on-disk file size and then the system crashes before the
+		 * contents of the file are flushed to disk then the files
+		 * may be full of holes (ie NULL files bug).
+		 */
+		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
+					      ip->i_size);
 		if (error) {
 			/*
 			 * If we get an error at this point we simply don't
@@ -670,13 +677,19 @@ xfs_inactive(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip, 0);
 
-		error = xfs_itruncate_data(&tp, ip, 0);
+		ip->i_d.di_size = 0;
+		ip->i_size = 0;
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
 		if (error) {
 			xfs_trans_cancel(tp,
 				XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
 			xfs_iunlock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
 			return VN_INACTIVE_CACHE;
 		}
+
+		ASSERT(ip->i_d.di_nextents == 0);
 	} else if (S_ISLNK(ip->i_d.di_mode)) {
 
 		/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/11] xfs: remove xfs_itruncate_data
  2011-12-08 15:57 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
@ 2011-12-13 21:23   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2011-12-13 21:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Dec 08, 2011 at 10:57:56AM -0500, Christoph Hellwig wrote:
> This wrapper isn't overly useful, not to say rather confusing.
> 
> Around the call to xfs_itruncate_extents it does:
> 
>  - add tracing
>  - add a few asserts in debug builds
>  - conditionally update the inode size in two places
>  - log the inode
> 
> Both the tracing and the inode logging can be moved to xfs_itruncate_extents
> as they are useful for the attribute fork as well - in fact the attr code
> already does an equivalent xfs_trans_log_inode call just after calling
> xfs_itruncate_extents.  The conditional size updates are a mess, and there
> was no reason to do them in two places anyway, as the first one was
> conditional on the inode having extents - but without extents we
> xfs_itruncate_extents would be a no-op and the placement wouldn't matter
> anyway.  Instead move the size assignments and the asserts that make sense
> to the callers that want it.
> 
> As a side effect of this clean up xfs_setattr_size by introducing variables
> for the old and new inode size, and moving the size updates into a common
> place.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I can't see anything wrong with this, and it hasn't produced any
failures in my testing, so it looks OK.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 00/11] inode shrink and misc updates V2
@ 2011-12-18 20:00 Christoph Hellwig
  2011-12-18 20:00 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

This series contains the inode shrink patches and a few related cleanups.
Most notably the file write path is a lot leaner now that we could get rid
of all the magic for the xfs_inode i_size and i_new_size menipulations.

 31 files changed, 388 insertions(+), 559 deletions(-)

Changes from V1:
  - update commit messages based on Daves feedback

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/11] xfs: remove xfs_itruncate_data
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-03 21:53   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb Christoph Hellwig
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-xfs_itruncate_data --]
[-- Type: text/plain, Size: 14642 bytes --]

This wrapper isn't overly useful, not to say rather confusing.

Around the call to xfs_itruncate_extents it does:

 - add tracing
 - add a few asserts in debug builds
 - conditionally update the inode size in two places
 - log the inode

Both the tracing and the inode logging can be moved to xfs_itruncate_extents
as they are useful for the attribute fork as well - in fact the attr code
already does an equivalent xfs_trans_log_inode call just after calling
xfs_itruncate_extents.  The conditional size updates are a mess, and there
was no reason to do them in two places anyway, as the first one was
conditional on the inode having extents - but without extents we
xfs_itruncate_extents would be a no-op and the placement wouldn't matter
anyway.  Instead move the size assignments and the asserts that make sense
to the callers that want it.

As a side effect of this clean up xfs_setattr_size by introducing variables
for the old and new inode size, and moving the size updates into a common
place.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_attr.c        |    4 -
 fs/xfs/xfs_inode.c       |  124 +++--------------------------------------------
 fs/xfs/xfs_inode.h       |    2 
 fs/xfs/xfs_iops.c        |   47 +++++++++++------
 fs/xfs/xfs_qm_syscalls.c |    9 +++
 fs/xfs/xfs_trace.h       |    4 -
 fs/xfs/xfs_vnodeops.c    |   17 +++++-
 7 files changed, 65 insertions(+), 142 deletions(-)

Index: xfs/fs/xfs/xfs_attr.c
===================================================================
--- xfs.orig/fs/xfs/xfs_attr.c	2011-11-30 12:58:07.820044461 +0100
+++ xfs/fs/xfs/xfs_attr.c	2011-11-30 12:58:51.519807719 +0100
@@ -827,10 +827,6 @@ xfs_attr_inactive(xfs_inode_t *dp)
 	if (error)
 		goto out;
 
-	/*
-	 * Commit the last in the sequence of transactions.
-	 */
-	xfs_trans_log_inode(trans, dp, XFS_ILOG_CORE);
 	error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
 
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-11-30 12:58:07.830044408 +0100
+++ xfs/fs/xfs/xfs_inode.c	2011-11-30 12:58:51.519807719 +0100
@@ -1166,52 +1166,6 @@ xfs_ialloc(
 }
 
 /*
- * Check to make sure that there are no blocks allocated to the
- * file beyond the size of the file.  We don't check this for
- * files with fixed size extents or real time extents, but we
- * at least do it for regular files.
- */
-#ifdef DEBUG
-STATIC void
-xfs_isize_check(
-	struct xfs_inode	*ip,
-	xfs_fsize_t		isize)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		map_first;
-	int			nimaps;
-	xfs_bmbt_irec_t		imaps[2];
-	int			error;
-
-	if (!S_ISREG(ip->i_d.di_mode))
-		return;
-
-	if (XFS_IS_REALTIME_INODE(ip))
-		return;
-
-	if (ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE)
-		return;
-
-	nimaps = 2;
-	map_first = XFS_B_TO_FSB(mp, (xfs_ufsize_t)isize);
-	/*
-	 * The filesystem could be shutting down, so bmapi may return
-	 * an error.
-	 */
-	error = xfs_bmapi_read(ip, map_first,
-			 (XFS_B_TO_FSB(mp,
-			       (xfs_ufsize_t)XFS_MAXIOFFSET(mp)) - map_first),
-			 imaps, &nimaps, XFS_BMAPI_ENTIRE);
-	if (error)
-		return;
-	ASSERT(nimaps == 1);
-	ASSERT(imaps[0].br_startblock == HOLESTARTBLOCK);
-}
-#else	/* DEBUG */
-#define xfs_isize_check(ip, isize)
-#endif	/* DEBUG */
-
-/*
  * Free up the underlying blocks past new_size.  The new size must be smaller
  * than the current size.  This routine can be used both for the attribute and
  * data fork, and does not modify the inode size, which is left to the caller.
@@ -1258,6 +1212,8 @@ xfs_itruncate_extents(
 	ASSERT(ip->i_itemp->ili_lock_flags == 0);
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 
+	trace_xfs_itruncate_extents_start(ip, new_size);
+
 	/*
 	 * Since it is possible for space to become allocated beyond
 	 * the end of the file (in a crash where the space is allocated
@@ -1325,6 +1281,14 @@ xfs_itruncate_extents(
 			goto out;
 	}
 
+	/*
+	 * Always re-log the inode so that our permanent transaction can keep
+	 * on rolling it forward in the log.
+	 */
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	trace_xfs_itruncate_extents_end(ip, new_size);
+
 out:
 	*tpp = tp;
 	return error;
@@ -1338,74 +1302,6 @@ out_bmap_cancel:
 	goto out;
 }
 
-int
-xfs_itruncate_data(
-	struct xfs_trans	**tpp,
-	struct xfs_inode	*ip,
-	xfs_fsize_t		new_size)
-{
-	int			error;
-
-	trace_xfs_itruncate_data_start(ip, new_size);
-
-	/*
-	 * The first thing we do is set the size to new_size permanently on
-	 * disk.  This way we don't have to worry about anyone ever being able
-	 * to look at the data being freed even in the face of a crash.
-	 * What we're getting around here is the case where we free a block, it
-	 * is allocated to another file, it is written to, and then we crash.
-	 * If the new data gets written to the file but the log buffers
-	 * containing the free and reallocation don't, then we'd end up with
-	 * garbage in the blocks being freed.  As long as we make the new_size
-	 * permanent before actually freeing any blocks it doesn't matter if
-	 * they get written to.
-	 */
-	if (ip->i_d.di_nextents > 0) {
-		/*
-		 * If we are not changing the file size then do not update
-		 * the on-disk file size - we may be called from
-		 * xfs_inactive_free_eofblocks().  If we update the on-disk
-		 * file size and then the system crashes before the contents
-		 * of the file are flushed to disk then the files may be
-		 * full of holes (ie NULL files bug).
-		 */
-		if (ip->i_size != new_size) {
-			ip->i_d.di_size = new_size;
-			ip->i_size = new_size;
-			xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
-		}
-	}
-
-	error = xfs_itruncate_extents(tpp, ip, XFS_DATA_FORK, new_size);
-	if (error)
-		return error;
-
-	/*
-	 * If we are not changing the file size then do not update the on-disk
-	 * file size - we may be called from xfs_inactive_free_eofblocks().
-	 * If we update the on-disk file size and then the system crashes
-	 * before the contents of the file are flushed to disk then the files
-	 * may be full of holes (ie NULL files bug).
-	 */
-	xfs_isize_check(ip, new_size);
-	if (ip->i_size != new_size) {
-		ip->i_d.di_size = new_size;
-		ip->i_size = new_size;
-	}
-
-	ASSERT(new_size != 0 || ip->i_delayed_blks == 0);
-	ASSERT(new_size != 0 || ip->i_d.di_nextents == 0);
-
-	/*
-	 * Always re-log the inode so that our permanent transaction can keep
-	 * on rolling it forward in the log.
-	 */
-	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
-
-	trace_xfs_itruncate_data_end(ip, new_size);
-	return 0;
-}
-
 /*
  * This is called when the inode's link count goes to 0.
  * We place the on-disk inode on a list in the AGI.  It
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:58:08.843372251 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-11-30 12:58:51.523141034 +0100
@@ -491,8 +491,6 @@ int		xfs_ifree(struct xfs_trans *, xfs_i
 			   struct xfs_bmap_free *);
 int		xfs_itruncate_extents(struct xfs_trans **, struct xfs_inode *,
 				      int, xfs_fsize_t);
-int		xfs_itruncate_data(struct xfs_trans **, struct xfs_inode *,
-				   xfs_fsize_t);
 int		xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
 
 void		xfs_iext_realloc(xfs_inode_t *, int, int);
Index: xfs/fs/xfs/xfs_iops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iops.c	2011-11-30 12:58:07.856710929 +0100
+++ xfs/fs/xfs/xfs_iops.c	2011-11-30 12:58:51.523141034 +0100
@@ -750,6 +750,7 @@ xfs_setattr_size(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct inode		*inode = VFS_I(ip);
 	int			mask = iattr->ia_valid;
+	xfs_off_t		oldsize, newsize;
 	struct xfs_trans	*tp;
 	int			error;
 	uint			lock_flags;
@@ -777,11 +778,13 @@ xfs_setattr_size(
 		lock_flags |= XFS_IOLOCK_EXCL;
 	xfs_ilock(ip, lock_flags);
 
+	oldsize = ip->i_size;
+	newsize = iattr->ia_size;
+
 	/*
 	 * Short circuit the truncate case for zero length files.
 	 */
-	if (iattr->ia_size == 0 &&
-	    ip->i_size == 0 && ip->i_d.di_nextents == 0) {
+	if (newsize == 0 && oldsize == 0 && ip->i_d.di_nextents == 0) {
 		if (!(mask & (ATTR_CTIME|ATTR_MTIME)))
 			goto out_unlock;
 
@@ -807,14 +810,14 @@ xfs_setattr_size(
 	 * the inode to the transaction, because the inode cannot be unlocked
 	 * once it is a part of the transaction.
 	 */
-	if (iattr->ia_size > ip->i_size) {
+	if (newsize > oldsize) {
 		/*
 		 * Do the first part of growing a file: zero any data in the
 		 * last block that is beyond the old EOF.  We need to do this
 		 * before the inode is joined to the transaction to modify
 		 * i_size.
 		 */
-		error = xfs_zero_eof(ip, iattr->ia_size, ip->i_size);
+		error = xfs_zero_eof(ip, newsize, oldsize);
 		if (error)
 			goto out_unlock;
 	}
@@ -833,8 +836,8 @@ xfs_setattr_size(
 	 * here and prevents waiting for other data not within the range we
 	 * care about here.
 	 */
-	if (ip->i_size != ip->i_d.di_size && iattr->ia_size > ip->i_d.di_size) {
-		error = xfs_flush_pages(ip, ip->i_d.di_size, iattr->ia_size, 0,
+	if (oldsize != ip->i_d.di_size && newsize > ip->i_d.di_size) {
+		error = xfs_flush_pages(ip, ip->i_d.di_size, newsize, 0,
 					FI_NONE);
 		if (error)
 			goto out_unlock;
@@ -845,8 +848,7 @@ xfs_setattr_size(
 	 */
 	inode_dio_wait(inode);
 
-	error = -block_truncate_page(inode->i_mapping, iattr->ia_size,
-				     xfs_get_blocks);
+	error = -block_truncate_page(inode->i_mapping, newsize, xfs_get_blocks);
 	if (error)
 		goto out_unlock;
 
@@ -857,7 +859,7 @@ xfs_setattr_size(
 	if (error)
 		goto out_trans_cancel;
 
-	truncate_setsize(inode, iattr->ia_size);
+	truncate_setsize(inode, newsize);
 
 	commit_flags = XFS_TRANS_RELEASE_LOG_RES;
 	lock_flags |= XFS_ILOCK_EXCL;
@@ -876,19 +878,30 @@ xfs_setattr_size(
 	 * these flags set.  For all other operations the VFS set these flags
 	 * explicitly if it wants a timestamp update.
 	 */
-	if (iattr->ia_size != ip->i_size &&
-	    (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
+	if (newsize != oldsize && (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
 		iattr->ia_ctime = iattr->ia_mtime =
 			current_fs_time(inode->i_sb);
 		mask |= ATTR_CTIME | ATTR_MTIME;
 	}
 
-	if (iattr->ia_size > ip->i_size) {
-		ip->i_d.di_size = iattr->ia_size;
-		ip->i_size = iattr->ia_size;
-	} else if (iattr->ia_size <= ip->i_size ||
-		   (iattr->ia_size == 0 && ip->i_d.di_nextents)) {
-		error = xfs_itruncate_data(&tp, ip, iattr->ia_size);
+	/*
+	 * The first thing we do is set the size to new_size permanently on
+	 * disk.  This way we don't have to worry about anyone ever being able
+	 * to look at the data being freed even in the face of a crash.
+	 * What we're getting around here is the case where we free a block, it
+	 * is allocated to another file, it is written to, and then we crash.
+	 * If the new data gets written to the file but the log buffers
+	 * containing the free and reallocation don't, then we'd end up with
+	 * garbage in the blocks being freed.  As long as we make the new size
+	 * permanent before actually freeing any blocks it doesn't matter if
+	 * they get written to.
+	 */
+	ip->i_d.di_size = newsize;
+	ip->i_size = newsize;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	if (newsize <= oldsize) {
+		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);
 		if (error)
 			goto out_trans_abort;
 
Index: xfs/fs/xfs/xfs_qm_syscalls.c
===================================================================
--- xfs.orig/fs/xfs/xfs_qm_syscalls.c	2011-11-30 12:58:07.866710876 +0100
+++ xfs/fs/xfs/xfs_qm_syscalls.c	2011-11-30 12:58:51.523141034 +0100
@@ -31,6 +31,7 @@
 #include "xfs_mount.h"
 #include "xfs_bmap_btree.h"
 #include "xfs_inode.h"
+#include "xfs_inode_item.h"
 #include "xfs_itable.h"
 #include "xfs_bmap.h"
 #include "xfs_rtalloc.h"
@@ -263,13 +264,19 @@ xfs_qm_scall_trunc_qfile(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_itruncate_data(&tp, ip, 0);
+	ip->i_d.di_size = 0;
+	ip->i_size = 0;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
 	if (error) {
 		xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
 				     XFS_TRANS_ABORT);
 		goto out_unlock;
 	}
 
+	ASSERT(ip->i_d.di_nextents == 0);
+
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
 
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h	2011-11-30 12:58:07.876710822 +0100
+++ xfs/fs/xfs/xfs_trace.h	2011-11-30 12:58:51.523141034 +0100
@@ -1090,8 +1090,8 @@ DECLARE_EVENT_CLASS(xfs_itrunc_class,
 DEFINE_EVENT(xfs_itrunc_class, name, \
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size), \
 	TP_ARGS(ip, new_size))
-DEFINE_ITRUNC_EVENT(xfs_itruncate_data_start);
-DEFINE_ITRUNC_EVENT(xfs_itruncate_data_end);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_start);
+DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_end);
 
 TRACE_EVENT(xfs_pagecache_inval,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t start, xfs_off_t finish),
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-11-30 12:58:07.893377397 +0100
+++ xfs/fs/xfs/xfs_vnodeops.c	2011-11-30 12:58:51.526474350 +0100
@@ -226,7 +226,14 @@ xfs_free_eofblocks(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip, 0);
 
-		error = xfs_itruncate_data(&tp, ip, ip->i_size);
+		/*
+		 * Do not update the on-disk file size.  If we update the
+		 * on-disk file size and then the system crashes before the
+		 * contents of the file are flushed to disk then the files
+		 * may be full of holes (ie NULL files bug).
+		 */
+		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
+					      ip->i_size);
 		if (error) {
 			/*
 			 * If we get an error at this point we simply don't
@@ -670,13 +677,19 @@ xfs_inactive(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip, 0);
 
-		error = xfs_itruncate_data(&tp, ip, 0);
+		ip->i_d.di_size = 0;
+		ip->i_size = 0;
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
 		if (error) {
 			xfs_trans_cancel(tp,
 				XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
 			xfs_iunlock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
 			return VN_INACTIVE_CACHE;
 		}
+
+		ASSERT(ip->i_d.di_nextents == 0);
 	} else if (S_ISLNK(ip->i_d.di_mode)) {
 
 		/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
  2011-12-18 20:00 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-04 20:32   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 03/11] xfs: remove the unused dm_attrs structure Christoph Hellwig
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-xfs_iomap_eof_align_last_fsb --]
[-- Type: text/plain, Size: 2278 bytes --]

Replace the nasty if, else if, elseif condition with more natural C flow
that expressed the logic we want here better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

---
 fs/xfs/xfs_iomap.c |   36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

Index: xfs/fs/xfs/xfs_iomap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iomap.c	2011-11-17 12:07:52.580802800 +0100
+++ xfs/fs/xfs/xfs_iomap.c	2011-11-30 10:57:35.649224495 +0100
@@ -57,26 +57,26 @@ xfs_iomap_eof_align_last_fsb(
 	xfs_fileoff_t	*last_fsb)
 {
 	xfs_fileoff_t	new_last_fsb = 0;
-	xfs_extlen_t	align;
+	xfs_extlen_t	align = 0;
 	int		eof, error;
 
-	if (XFS_IS_REALTIME_INODE(ip))
-		;
-	/*
-	 * If mounted with the "-o swalloc" option, roundup the allocation
-	 * request to a stripe width boundary if the file size is >=
-	 * stripe width and we are allocating past the allocation eof.
-	 */
-	else if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC) &&
-	        (ip->i_size >= XFS_FSB_TO_B(mp, mp->m_swidth)))
-		new_last_fsb = roundup_64(*last_fsb, mp->m_swidth);
-	/*
-	 * Roundup the allocation request to a stripe unit (m_dalign) boundary
-	 * if the file size is >= stripe unit size, and we are allocating past
-	 * the allocation eof.
-	 */
-	else if (mp->m_dalign && (ip->i_size >= XFS_FSB_TO_B(mp, mp->m_dalign)))
-		new_last_fsb = roundup_64(*last_fsb, mp->m_dalign);
+	if (!XFS_IS_REALTIME_INODE(ip)) {
+		/*
+		 * Round up the allocation request to a stripe unit
+		 * (m_dalign) boundary if the file size is >= stripe unit
+		 * size, and we are allocating past the allocation eof.
+		 *
+		 * If mounted with the "-o swalloc" option the alignment is
+		 * increased from the strip unit size to the stripe width.
+		 */
+		if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
+			align = mp->m_swidth;
+		else if (mp->m_dalign)
+			align = mp->m_dalign;
+
+		if (align && ip->i_size >= XFS_FSB_TO_B(mp, align))
+			new_last_fsb = roundup_64(*last_fsb, align);
+	}
 
 	/*
 	 * Always round up the allocation request to an extent boundary

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 03/11] xfs: remove the unused dm_attrs structure
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
  2011-12-18 20:00 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
  2011-12-18 20:00 ` [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-04 21:13   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork Christoph Hellwig
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-remove-dm_attrs --]
[-- Type: text/plain, Size: 1190 bytes --]

.. and the just as dead bhv_desc forward declaration while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

---
 fs/xfs/xfs_inode.h |    7 -------
 1 file changed, 7 deletions(-)

Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 10:56:57.396098395 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-11-30 10:57:46.375833051 +0100
@@ -211,7 +211,6 @@ typedef struct xfs_icdinode {
 
 #ifdef __KERNEL__
 
-struct bhv_desc;
 struct xfs_buf;
 struct xfs_bmap_free;
 struct xfs_bmbt_irec;
@@ -220,12 +219,6 @@ struct xfs_mount;
 struct xfs_trans;
 struct xfs_dquot;
 
-typedef struct dm_attrs_s {
-	__uint32_t	da_dmevmask;	/* DMIG event mask */
-	__uint16_t	da_dmstate;	/* DMIG state info */
-	__uint16_t	da_pad;		/* DMIG extra padding */
-} dm_attrs_t;
-
 typedef struct xfs_inode {
 	/* Inode linking and identification information. */
 	struct xfs_mount	*i_mount;	/* fs mount struct ptr */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 03/11] xfs: remove the unused dm_attrs structure Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-06 16:58   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 05/11] xfs: make i_flags an unsigned long Christoph Hellwig
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-ifp-if_ext_max --]
[-- Type: text/plain, Size: 19985 bytes --]

We spent a lot of effort to maintain this field, but it always equalts to the
fork size divided by the constant size of an extent.  The prime use of it is
to assert that the two stay in sync.  Just divide the fork size by the extent
size in the few places that we actually use it and remove the overhead
of maintaining it.  Also introduce a few helpers to consolidate the places
where we actually care about the value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

---
 fs/xfs/xfs_attr_leaf.c  |    9 ----
 fs/xfs/xfs_bmap.c       |  101 +++++++++++++++++++++---------------------------
 fs/xfs/xfs_dfrag.c      |   39 +++++++++---------
 fs/xfs/xfs_iget.c       |    2 
 fs/xfs/xfs_inode.c      |   30 ++++----------
 fs/xfs/xfs_inode.h      |    4 -
 fs/xfs/xfs_inode_item.c |    2 
 fs/xfs/xfs_trace.h      |    5 --
 8 files changed, 78 insertions(+), 114 deletions(-)

Index: xfs/fs/xfs/xfs_attr_leaf.c
===================================================================
--- xfs.orig/fs/xfs/xfs_attr_leaf.c	2011-12-12 10:33:55.735363609 -0800
+++ xfs/fs/xfs/xfs_attr_leaf.c	2011-12-14 05:15:20.609040372 -0800
@@ -271,10 +271,6 @@ xfs_attr_shortform_add(xfs_da_args_t *ar
 	dp = args->dp;
 	mp = dp->i_mount;
 	dp->i_d.di_forkoff = forkoff;
-	dp->i_df.if_ext_max =
-		XFS_IFORK_DSIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
-	dp->i_afp->if_ext_max =
-		XFS_IFORK_ASIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
 
 	ifp = dp->i_afp;
 	ASSERT(ifp->if_flags & XFS_IFINLINE);
@@ -326,7 +322,6 @@ xfs_attr_fork_reset(
 	ASSERT(ip->i_d.di_anextents == 0);
 	ASSERT(ip->i_afp == NULL);
 
-	ip->i_df.if_ext_max = XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 }
 
@@ -389,10 +384,6 @@ xfs_attr_shortform_remove(xfs_da_args_t
 				(args->op_flags & XFS_DA_OP_ADDNAME) ||
 				!(mp->m_flags & XFS_MOUNT_ATTR2) ||
 				dp->i_d.di_format == XFS_DINODE_FMT_BTREE);
-		dp->i_afp->if_ext_max =
-			XFS_IFORK_ASIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
-		dp->i_df.if_ext_max =
-			XFS_IFORK_DSIZE(dp) / (uint)sizeof(xfs_bmbt_rec_t);
 		xfs_trans_log_inode(args->trans, dp,
 					XFS_ILOG_CORE | XFS_ILOG_ADATA);
 	}
Index: xfs/fs/xfs/xfs_bmap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.c	2011-12-12 10:33:55.748696870 -0800
+++ xfs/fs/xfs/xfs_bmap.c	2011-12-14 05:15:20.612373687 -0800
@@ -249,7 +249,27 @@ xfs_bmbt_lookup_ge(
 }
 
 /*
-* Update the record referred to by cur to the value given
+ * Check if the inode needs to be converted to btree format.
+ */
+static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
+{
+	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
+		XFS_IFORK_NEXTENTS(ip, whichfork) >
+			XFS_IFORK_MAXEXT(ip, whichfork);
+}
+
+/*
+ * Check if the inode should be converted to extent format.
+ */
+static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
+{
+	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
+		XFS_IFORK_NEXTENTS(ip, whichfork) <=
+			XFS_IFORK_MAXEXT(ip, whichfork);
+}
+
+/*
+ * Update the record referred to by cur to the value given
  * by [off, bno, len, state].
  * This either works (return 0) or gets an EFSCORRUPTED error.
  */
@@ -683,8 +703,8 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, done);
 		}
-		if (bma->ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
-		    bma->ip->i_d.di_nextents > bma->ip->i_df.if_ext_max) {
+
+		if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
 					bma->firstblock, bma->flist,
 					&bma->cur, 1, &tmp_rval, XFS_DATA_FORK);
@@ -767,8 +787,8 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, done);
 		}
-		if (bma->ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
-		    bma->ip->i_d.di_nextents > bma->ip->i_df.if_ext_max) {
+
+		if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
 				bma->firstblock, bma->flist, &bma->cur, 1,
 				&tmp_rval, XFS_DATA_FORK);
@@ -836,8 +856,8 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, done);
 		}
-		if (bma->ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
-		    bma->ip->i_d.di_nextents > bma->ip->i_df.if_ext_max) {
+
+		if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
 					bma->firstblock, bma->flist, &bma->cur,
 					1, &tmp_rval, XFS_DATA_FORK);
@@ -884,8 +904,7 @@ xfs_bmap_add_extent_delay_real(
 	}
 
 	/* convert to a btree if necessary */
-	if (XFS_IFORK_FORMAT(bma->ip, XFS_DATA_FORK) == XFS_DINODE_FMT_EXTENTS &&
-	    XFS_IFORK_NEXTENTS(bma->ip, XFS_DATA_FORK) > ifp->if_ext_max) {
+	if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
 		int	tmp_logflags;	/* partial log flag return val */
 
 		ASSERT(bma->cur == NULL);
@@ -1421,8 +1440,7 @@ xfs_bmap_add_extent_unwritten_real(
 	}
 
 	/* convert to a btree if necessary */
-	if (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) == XFS_DINODE_FMT_EXTENTS &&
-	    XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) > ifp->if_ext_max) {
+	if (xfs_bmap_needs_btree(ip, XFS_DATA_FORK)) {
 		int	tmp_logflags;	/* partial log flag return val */
 
 		ASSERT(cur == NULL);
@@ -1812,8 +1830,7 @@ xfs_bmap_add_extent_hole_real(
 	}
 
 	/* convert to a btree if necessary */
-	if (XFS_IFORK_FORMAT(bma->ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
-	    XFS_IFORK_NEXTENTS(bma->ip, whichfork) > ifp->if_ext_max) {
+	if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 		int	tmp_logflags;	/* partial log flag return val */
 
 		ASSERT(bma->cur == NULL);
@@ -3037,8 +3054,7 @@ xfs_bmap_extents_to_btree(
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
-	ASSERT(ifp->if_ext_max ==
-	       XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
 	/*
 	 * Make space in the inode incore.
 	 */
@@ -3184,13 +3200,8 @@ xfs_bmap_forkoff_reset(
 	    ip->i_d.di_format != XFS_DINODE_FMT_BTREE) {
 		uint	dfl_forkoff = xfs_default_attroffset(ip) >> 3;
 
-		if (dfl_forkoff > ip->i_d.di_forkoff) {
+		if (dfl_forkoff > ip->i_d.di_forkoff)
 			ip->i_d.di_forkoff = dfl_forkoff;
-			ip->i_df.if_ext_max =
-				XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t);
-			ip->i_afp->if_ext_max =
-				XFS_IFORK_ASIZE(ip) / sizeof(xfs_bmbt_rec_t);
-		}
 	}
 }
 
@@ -3430,8 +3441,6 @@ xfs_bmap_add_attrfork(
 	int			error;		/* error return value */
 
 	ASSERT(XFS_IFORK_Q(ip) == 0);
-	ASSERT(ip->i_df.if_ext_max ==
-	       XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
 
 	mp = ip->i_mount;
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
@@ -3486,12 +3495,9 @@ xfs_bmap_add_attrfork(
 		error = XFS_ERROR(EINVAL);
 		goto error1;
 	}
-	ip->i_df.if_ext_max =
-		XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
+
 	ASSERT(ip->i_afp == NULL);
 	ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP);
-	ip->i_afp->if_ext_max =
-		XFS_IFORK_ASIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
 	ip->i_afp->if_flags = XFS_IFEXTENTS;
 	logflags = 0;
 	xfs_bmap_init(&flist, &firstblock);
@@ -3535,20 +3541,17 @@ xfs_bmap_add_attrfork(
 		} else
 			spin_unlock(&mp->m_sb_lock);
 	}
-	if ((error = xfs_bmap_finish(&tp, &flist, &committed)))
+
+	error = xfs_bmap_finish(&tp, &flist, &committed);
+	if (error)
 		goto error2;
-	error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
-	ASSERT(ip->i_df.if_ext_max ==
-	       XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
-	return error;
+	return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
 error2:
 	xfs_bmap_cancel(&flist);
 error1:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 error0:
 	xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_ABORT);
-	ASSERT(ip->i_df.if_ext_max ==
-	       XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
 	return error;
 }
 
@@ -4379,8 +4382,6 @@ xfs_bmapi_read(
 	XFS_STATS_INC(xs_blk_mapr);
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
-	ASSERT(ifp->if_ext_max ==
-	       XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
 
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(NULL, ip, whichfork);
@@ -4871,8 +4872,6 @@ xfs_bmapi_write(
 		return XFS_ERROR(EIO);
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
-	ASSERT(ifp->if_ext_max ==
-	       XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
 
 	XFS_STATS_INC(xs_blk_mapw);
 
@@ -4981,8 +4980,7 @@ xfs_bmapi_write(
 	/*
 	 * Transform from btree to extents, give it cur.
 	 */
-	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
-	    XFS_IFORK_NEXTENTS(ip, whichfork) <= ifp->if_ext_max) {
+	if (xfs_bmap_wants_extents(ip, whichfork)) {
 		int		tmp_logflags = 0;
 
 		ASSERT(bma.cur);
@@ -4992,10 +4990,10 @@ xfs_bmapi_write(
 		if (error)
 			goto error0;
 	}
-	ASSERT(ifp->if_ext_max ==
-	       XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE ||
-	       XFS_IFORK_NEXTENTS(ip, whichfork) > ifp->if_ext_max);
+	       XFS_IFORK_NEXTENTS(ip, whichfork) >
+		XFS_IFORK_MAXEXT(ip, whichfork));
 	error = 0;
 error0:
 	/*
@@ -5095,8 +5093,7 @@ xfs_bunmapi(
 
 	ASSERT(len > 0);
 	ASSERT(nexts >= 0);
-	ASSERT(ifp->if_ext_max ==
-	       XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
 	if (!(ifp->if_flags & XFS_IFEXTENTS) &&
 	    (error = xfs_iread_extents(tp, ip, whichfork)))
 		return error;
@@ -5321,8 +5318,7 @@ xfs_bunmapi(
 		 * will be dirty.
 		 */
 		if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
-		    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
-		    XFS_IFORK_NEXTENTS(ip, whichfork) >= ifp->if_ext_max &&
+		    xfs_bmap_needs_btree(ip, whichfork) &&
 		    del.br_startoff > got.br_startoff &&
 		    del.br_startoff + del.br_blockcount <
 		    got.br_startoff + got.br_blockcount) {
@@ -5353,13 +5349,11 @@ nodelete:
 		}
 	}
 	*done = bno == (xfs_fileoff_t)-1 || bno < start || lastx < 0;
-	ASSERT(ifp->if_ext_max ==
-	       XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
+
 	/*
 	 * Convert to a btree if necessary.
 	 */
-	if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
-	    XFS_IFORK_NEXTENTS(ip, whichfork) > ifp->if_ext_max) {
+	if (xfs_bmap_needs_btree(ip, whichfork)) {
 		ASSERT(cur == NULL);
 		error = xfs_bmap_extents_to_btree(tp, ip, firstblock, flist,
 			&cur, 0, &tmp_logflags, whichfork);
@@ -5370,8 +5364,7 @@ nodelete:
 	/*
 	 * transform from btree to extents, give it cur
 	 */
-	else if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
-		 XFS_IFORK_NEXTENTS(ip, whichfork) <= ifp->if_ext_max) {
+	else if (xfs_bmap_wants_extents(ip, whichfork)) {
 		ASSERT(cur != NULL);
 		error = xfs_bmap_btree_to_extents(tp, ip, cur, &tmp_logflags,
 			whichfork);
@@ -5382,8 +5375,6 @@ nodelete:
 	/*
 	 * transform from extents to local?
 	 */
-	ASSERT(ifp->if_ext_max ==
-	       XFS_IFORK_SIZE(ip, whichfork) / (uint)sizeof(xfs_bmbt_rec_t));
 	error = 0;
 error0:
 	/*
Index: xfs/fs/xfs/xfs_dfrag.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dfrag.c	2011-12-12 10:33:55.758696817 -0800
+++ xfs/fs/xfs/xfs_dfrag.c	2011-12-14 05:16:50.911884493 -0800
@@ -163,12 +163,14 @@ xfs_swap_extents_check_format(
 
 	/* Check temp in extent form to max in target */
 	if (tip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
-	    XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) > ip->i_df.if_ext_max)
+	    XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) >
+			XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
 		return EINVAL;
 
 	/* Check target in extent form to max in temp */
 	if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
-	    XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) > tip->i_df.if_ext_max)
+	    XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) >
+			XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
 		return EINVAL;
 
 	/*
@@ -180,18 +182,25 @@ xfs_swap_extents_check_format(
 	 * (a common defrag case) which will occur when the temp inode is in
 	 * extent format...
 	 */
-	if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
-	    ((XFS_IFORK_BOFF(ip) &&
-	      tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip)) ||
-	     XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <= ip->i_df.if_ext_max))
-		return EINVAL;
+	if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+		if (XFS_IFORK_BOFF(ip) &&
+		    tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
+			return EINVAL;
+		if (XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <=
+		    XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
+			return EINVAL;
+	}
 
 	/* Reciprocal target->temp btree format checks */
-	if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
-	    ((XFS_IFORK_BOFF(tip) &&
-	      ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip)) ||
-	     XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <= tip->i_df.if_ext_max))
-		return EINVAL;
+	if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+		if (XFS_IFORK_BOFF(tip) &&
+		    ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
+			return EINVAL;
+
+		if (XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <=
+		    XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
+			return EINVAL;
+	}
 
 	return 0;
 }
@@ -349,16 +358,6 @@ xfs_swap_extents(
 	*tifp = *tempifp;	/* struct copy */
 
 	/*
-	 * Fix the in-memory data fork values that are dependent on the fork
-	 * offset in the inode. We can't assume they remain the same as attr2
-	 * has dynamic fork offsets.
-	 */
-	ifp->if_ext_max = XFS_IFORK_SIZE(ip, XFS_DATA_FORK) /
-					(uint)sizeof(xfs_bmbt_rec_t);
-	tifp->if_ext_max = XFS_IFORK_SIZE(tip, XFS_DATA_FORK) /
-					(uint)sizeof(xfs_bmbt_rec_t);
-
-	/*
 	 * Fix the on-disk inode values
 	 */
 	tmp = (__uint64_t)ip->i_d.di_nblocks;
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c	2011-12-12 10:33:55.772030077 -0800
+++ xfs/fs/xfs/xfs_iget.c	2011-12-14 05:15:20.612373687 -0800
@@ -451,8 +451,6 @@ again:
 
 	*ipp = ip;
 
-	ASSERT(ip->i_df.if_ext_max ==
-	       XFS_IFORK_DSIZE(ip) / sizeof(xfs_bmbt_rec_t));
 	/*
 	 * If we have a real type for an on-disk inode, we can set ops(&unlock)
 	 * now.	 If it's a new inode being created, xfs_ialloc will handle it.
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-12-14 05:14:09.522758812 -0800
+++ xfs/fs/xfs/xfs_inode.c	2011-12-14 05:15:20.612373687 -0800
@@ -299,11 +299,8 @@ xfs_iformat(
 {
 	xfs_attr_shortform_t	*atp;
 	int			size;
-	int			error;
+	int			error = 0;
 	xfs_fsize_t             di_size;
-	ip->i_df.if_ext_max =
-		XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
-	error = 0;
 
 	if (unlikely(be32_to_cpu(dip->di_nextents) +
 		     be16_to_cpu(dip->di_anextents) >
@@ -409,10 +406,10 @@ xfs_iformat(
 	}
 	if (!XFS_DFORK_Q(dip))
 		return 0;
+
 	ASSERT(ip->i_afp == NULL);
 	ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP | KM_NOFS);
-	ip->i_afp->if_ext_max =
-		XFS_IFORK_ASIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
+
 	switch (dip->di_aformat) {
 	case XFS_DINODE_FMT_LOCAL:
 		atp = (xfs_attr_shortform_t *)XFS_DFORK_APTR(dip);
@@ -604,10 +601,11 @@ xfs_iformat_btree(
 	 * or the number of extents is greater than the number of
 	 * blocks.
 	 */
-	if (unlikely(XFS_IFORK_NEXTENTS(ip, whichfork) <= ifp->if_ext_max
-	    || XFS_BMDR_SPACE_CALC(nrecs) >
-			XFS_DFORK_SIZE(dip, ip->i_mount, whichfork)
-	    || XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
+	if (unlikely(XFS_IFORK_NEXTENTS(ip, whichfork) <=
+			XFS_IFORK_MAXEXT(ip, whichfork) ||
+		     XFS_BMDR_SPACE_CALC(nrecs) >
+			XFS_DFORK_SIZE(dip, ip->i_mount, whichfork) ||
+		     XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
 		xfs_warn(ip->i_mount, "corrupt inode %Lu (btree).",
 			(unsigned long long) ip->i_ino);
 		XFS_CORRUPTION_ERROR("xfs_iformat_btree", XFS_ERRLEVEL_LOW,
@@ -835,12 +833,6 @@ xfs_iread(
 		 * with the uninitialized part of it.
 		 */
 		ip->i_d.di_mode = 0;
-		/*
-		 * Initialize the per-fork minima and maxima for a new
-		 * inode here.  xfs_iformat will do it for old inodes.
-		 */
-		ip->i_df.if_ext_max =
-			XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
 	}
 
 	/*
@@ -1740,8 +1732,6 @@ xfs_ifree(
 	ip->i_d.di_flags = 0;
 	ip->i_d.di_dmevmask = 0;
 	ip->i_d.di_forkoff = 0;		/* mark the attr fork not in use */
-	ip->i_df.if_ext_max =
-		XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
 	ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
 	ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
 	/*
@@ -2408,7 +2398,7 @@ xfs_iflush(
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
 	ASSERT(!completion_done(&ip->i_flush));
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
-	       ip->i_d.di_nextents > ip->i_df.if_ext_max);
+	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
 
 	iip = ip->i_itemp;
 	mp = ip->i_mount;
@@ -2524,7 +2514,7 @@ xfs_iflush_int(
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
 	ASSERT(!completion_done(&ip->i_flush));
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
-	       ip->i_d.di_nextents > ip->i_df.if_ext_max);
+	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
 
 	iip = ip->i_itemp;
 	mp = ip->i_mount;
Index: xfs/fs/xfs/xfs_inode_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode_item.c	2011-12-12 10:33:55.795363284 -0800
+++ xfs/fs/xfs/xfs_inode_item.c	2011-12-14 05:15:20.615707002 -0800
@@ -79,8 +79,6 @@ xfs_inode_item_size(
 		break;
 
 	case XFS_DINODE_FMT_BTREE:
-		ASSERT(ip->i_df.if_ext_max ==
-		       XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t));
 		iip->ili_format.ilf_fields &=
 			~(XFS_ILOG_DDATA | XFS_ILOG_DEXT |
 			  XFS_ILOG_DEV | XFS_ILOG_UUID);
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h	2011-12-14 05:14:09.526092127 -0800
+++ xfs/fs/xfs/xfs_trace.h	2011-12-14 05:15:20.615707002 -0800
@@ -1568,7 +1568,6 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_clas
 		__field(xfs_ino_t, ino)
 		__field(int, format)
 		__field(int, nex)
-		__field(int, max_nex)
 		__field(int, broot_size)
 		__field(int, fork_off)
 	),
@@ -1578,18 +1577,16 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_clas
 		__entry->ino = ip->i_ino;
 		__entry->format = ip->i_d.di_format;
 		__entry->nex = ip->i_d.di_nextents;
-		__entry->max_nex = ip->i_df.if_ext_max;
 		__entry->broot_size = ip->i_df.if_broot_bytes;
 		__entry->fork_off = XFS_IFORK_BOFF(ip);
 	),
 	TP_printk("dev %d:%d ino 0x%llx (%s), %s format, num_extents %d, "
-		  "Max in-fork extents %d, broot size %d, fork offset %d",
+		  "broot size %d, fork offset %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __print_symbolic(__entry->which, XFS_SWAPEXT_INODES),
 		  __print_symbolic(__entry->format, XFS_INODE_FORMAT_STR),
 		  __entry->nex,
-		  __entry->max_nex,
 		  __entry->broot_size,
 		  __entry->fork_off)
 )
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-12-14 05:15:18.779050287 -0800
+++ xfs/fs/xfs/xfs_inode.h	2011-12-14 05:15:20.615707002 -0800
@@ -66,7 +66,6 @@ typedef struct xfs_ifork {
 	struct xfs_btree_block	*if_broot;	/* file's incore btree root */
 	short			if_broot_bytes;	/* bytes allocated for root */
 	unsigned char		if_flags;	/* per-fork flags */
-	unsigned char		if_ext_max;	/* max # of extent records */
 	union {
 		xfs_bmbt_rec_host_t *if_extents;/* linear map file exts */
 		xfs_ext_irec_t	*if_ext_irec;	/* irec map file exts */
@@ -206,7 +205,8 @@ typedef struct xfs_icdinode {
 	((w) == XFS_DATA_FORK ? \
 		((ip)->i_d.di_nextents = (n)) : \
 		((ip)->i_d.di_anextents = (n)))
-
+#define XFS_IFORK_MAXEXT(ip, w) \
+	(XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t))
 
 
 #ifdef __KERNEL__

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 05/11] xfs: make i_flags an unsigned long
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2011-12-18 20:00 ` [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock Christoph Hellwig
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-use-i_flags --]
[-- Type: text/plain, Size: 1314 bytes --]

To be used for bit wakeup i_flags needs to be an unsigned long or we'll
run into trouble on big endian systems.  Beause of the 1-byte i_update
field right after it this actually causes a fairly large size increase
on its own (4 or 8 bytes), but that increase will be more than offset
by the next two patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

---
 fs/xfs/xfs_inode.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:58:58.449770178 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-11-30 12:59:05.013067955 +0100
@@ -242,7 +242,7 @@ typedef struct xfs_inode {
 	wait_queue_head_t	i_ipin_wait;	/* inode pinning wait queue */
 	spinlock_t		i_flags_lock;	/* inode i_flags lock */
 	/* Miscellaneous state. */
-	unsigned short		i_flags;	/* see defined flags below */
+	unsigned long		i_flags;	/* see defined flags below */
 	unsigned char		i_update_core;	/* timestamps/size is dirty */
 	unsigned int		i_delayed_blks;	/* count of delay alloc blks */
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 05/11] xfs: make i_flags an unsigned long Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-13 21:49   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue Christoph Hellwig
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-i_flush --]
[-- Type: text/plain, Size: 9457 bytes --]

We almost never block on i_flock, the exception is synchronous inode
flushing.  Instead of bloating the inode with a 16/24-byte completion
that we abuse as a semaphore just implement it as a bitlock that uses
a bit waitqueue for the rare sleeping path.  This primarily is a
tradeoff between a much smaller inode and a faster non-blocking
path vs faster wakeups, and we are much better off with the former.

A small downside is that we will lose lockdep checking for i_flock, but
given that it's always taken inside the ilock that should be acceptable.

Note that for example the inode writeback locking is implemented in a
very similar way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>

---
 fs/xfs/xfs_iget.c       |   20 +++++++++++-
 fs/xfs/xfs_inode.c      |    4 +-
 fs/xfs/xfs_inode.h      |   78 ++++++++++++++++++++++++++++++------------------
 fs/xfs/xfs_inode_item.c |    4 +-
 fs/xfs/xfs_super.c      |    7 ----
 fs/xfs/xfs_sync.c       |    9 ++---
 6 files changed, 76 insertions(+), 46 deletions(-)

Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c	2011-12-18 08:05:01.459974561 -0800
+++ xfs/fs/xfs/xfs_iget.c	2011-12-18 08:06:23.263307404 -0800
@@ -77,7 +77,7 @@ xfs_inode_alloc(
 
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
 	ASSERT(!spin_is_locked(&ip->i_flags_lock));
-	ASSERT(completion_done(&ip->i_flush));
+	ASSERT(!xfs_isiflocked(ip));
 	ASSERT(ip->i_ino == 0);
 
 	mrlock_init(&ip->i_iolock, MRLOCK_BARRIER, "xfsio", ip->i_ino);
@@ -151,7 +151,7 @@ xfs_inode_free(
 	/* asserts to verify all state is correct here */
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
 	ASSERT(!spin_is_locked(&ip->i_flags_lock));
-	ASSERT(completion_done(&ip->i_flush));
+	ASSERT(!xfs_isiflocked(ip));
 
 	/*
 	 * Because we use RCU freeing we need to ensure the inode always
@@ -714,3 +714,19 @@ xfs_isilocked(
 	return 0;
 }
 #endif
+
+void
+__xfs_iflock(
+	struct xfs_inode	*ip)
+{
+	wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IFLOCK_BIT);
+	DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IFLOCK_BIT);
+
+	do {
+		prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+		if (xfs_isiflocked(ip))
+			io_schedule();
+	} while (!xfs_iflock_nowait(ip));
+
+	finish_wait(wq, &wait.wait);
+}
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-12-18 08:05:01.469974560 -0800
+++ xfs/fs/xfs/xfs_inode.c	2011-12-18 08:06:23.266640737 -0800
@@ -2396,7 +2396,7 @@ xfs_iflush(
 	XFS_STATS_INC(xs_iflush_count);
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
-	ASSERT(!completion_done(&ip->i_flush));
+	ASSERT(xfs_isiflocked(ip));
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
 
@@ -2512,7 +2512,7 @@ xfs_iflush_int(
 #endif
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
-	ASSERT(!completion_done(&ip->i_flush));
+	ASSERT(xfs_isiflocked(ip));
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
 
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-12-18 08:05:01.483307894 -0800
+++ xfs/fs/xfs/xfs_inode.h	2011-12-18 08:10:46.903305830 -0800
@@ -237,7 +237,6 @@ typedef struct xfs_inode {
 	struct xfs_inode_log_item *i_itemp;	/* logging information */
 	mrlock_t		i_lock;		/* inode lock */
 	mrlock_t		i_iolock;	/* inode IO lock */
-	struct completion	i_flush;	/* inode flush completion q */
 	atomic_t		i_pincount;	/* inode pin count */
 	wait_queue_head_t	i_ipin_wait;	/* inode pinning wait queue */
 	spinlock_t		i_flags_lock;	/* inode i_flags lock */
@@ -324,6 +323,19 @@ xfs_iflags_test_and_clear(xfs_inode_t *i
 	return ret;
 }
 
+static inline int
+xfs_iflags_test_and_set(xfs_inode_t *ip, unsigned short flags)
+{
+	int ret;
+
+	spin_lock(&ip->i_flags_lock);
+	ret = ip->i_flags & flags;
+	if (!ret)
+		ip->i_flags |= flags;
+	spin_unlock(&ip->i_flags_lock);
+	return ret;
+}
+
 /*
  * Project quota id helpers (previously projid was 16bit only
  * and using two 16bit values to hold new 32bit projid was chosen
@@ -344,35 +356,17 @@ xfs_set_projid(struct xfs_inode *ip,
 }
 
 /*
- * Manage the i_flush queue embedded in the inode.  This completion
- * queue synchronizes processes attempting to flush the in-core
- * inode back to disk.
- */
-static inline void xfs_iflock(xfs_inode_t *ip)
-{
-	wait_for_completion(&ip->i_flush);
-}
-
-static inline int xfs_iflock_nowait(xfs_inode_t *ip)
-{
-	return try_wait_for_completion(&ip->i_flush);
-}
-
-static inline void xfs_ifunlock(xfs_inode_t *ip)
-{
-	complete(&ip->i_flush);
-}
-
-/*
  * In-core inode flags.
  */
-#define XFS_IRECLAIM		0x0001  /* started reclaiming this inode */
-#define XFS_ISTALE		0x0002	/* inode has been staled */
-#define XFS_IRECLAIMABLE	0x0004	/* inode can be reclaimed */
-#define XFS_INEW		0x0008	/* inode has just been allocated */
-#define XFS_IFILESTREAM		0x0010	/* inode is in a filestream directory */
-#define XFS_ITRUNCATED		0x0020	/* truncated down so flush-on-close */
-#define XFS_IDIRTY_RELEASE	0x0040	/* dirty release already seen */
+#define XFS_IRECLAIM		(1 << 0) /* started reclaiming this inode */
+#define XFS_ISTALE		(1 << 1) /* inode has been staled */
+#define XFS_IRECLAIMABLE	(1 << 2) /* inode can be reclaimed */
+#define XFS_INEW		(1 << 3) /* inode has just been allocated */
+#define XFS_IFILESTREAM		(1 << 4) /* inode is in a filestream dir. */
+#define XFS_ITRUNCATED		(1 << 5) /* truncated down so flush-on-close */
+#define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
+#define __XFS_IFLOCK_BIT	7	 /* inode is being flushed right now */
+#define XFS_IFLOCK		(1 << __XFS_IFLOCK_BIT)
 
 /*
  * Per-lifetime flags need to be reset when re-using a reclaimable inode during
@@ -385,6 +379,34 @@ static inline void xfs_ifunlock(xfs_inod
 	 XFS_IFILESTREAM);
 
 /*
+ * Synchronize processes attempting to flush the in-core inode back to disk.
+ */
+
+extern void __xfs_iflock(struct xfs_inode *ip);
+
+static inline int xfs_iflock_nowait(struct xfs_inode *ip)
+{
+	return !xfs_iflags_test_and_set(ip, XFS_IFLOCK);
+}
+
+static inline void xfs_iflock(struct xfs_inode *ip)
+{
+	if (!xfs_iflock_nowait(ip))
+		__xfs_iflock(ip);
+}
+
+static inline void xfs_ifunlock(struct xfs_inode *ip)
+{
+	xfs_iflags_clear(ip, XFS_IFLOCK);
+	wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
+}
+
+static inline int xfs_isiflocked(struct xfs_inode *ip)
+{
+	return xfs_iflags_test(ip, XFS_IFLOCK);
+}
+
+/*
  * Flags for inode locking.
  * Bit ranges:	1<<1  - 1<<16-1 -- iolock/ilock modes (bitfield)
  *		1<<16 - 1<<32-1 -- lockdep annotation (integers)
Index: xfs/fs/xfs/xfs_inode_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode_item.c	2011-12-18 08:05:01.493307894 -0800
+++ xfs/fs/xfs/xfs_inode_item.c	2011-12-18 08:06:23.266640737 -0800
@@ -717,7 +717,7 @@ xfs_inode_item_pushbuf(
 	 * If a flush is not in progress anymore, chances are that the
 	 * inode was taken off the AIL. So, just get out.
 	 */
-	if (completion_done(&ip->i_flush) ||
+	if (!xfs_isiflocked(ip) ||
 	    !(lip->li_flags & XFS_LI_IN_AIL)) {
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		return true;
@@ -750,7 +750,7 @@ xfs_inode_item_push(
 	struct xfs_inode	*ip = iip->ili_inode;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED));
-	ASSERT(!completion_done(&ip->i_flush));
+	ASSERT(xfs_isiflocked(ip));
 
 	/*
 	 * Since we were able to lock the inode's flush lock and
Index: xfs/fs/xfs/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/xfs_super.c	2011-12-18 08:05:01.506641227 -0800
+++ xfs/fs/xfs/xfs_super.c	2011-12-18 08:06:23.269974071 -0800
@@ -829,13 +829,6 @@ xfs_fs_inode_init_once(
 	atomic_set(&ip->i_pincount, 0);
 	spin_lock_init(&ip->i_flags_lock);
 	init_waitqueue_head(&ip->i_ipin_wait);
-	/*
-	 * Because we want to use a counting completion, complete
-	 * the flush completion once to allow a single access to
-	 * the flush completion without blocking.
-	 */
-	init_completion(&ip->i_flush);
-	complete(&ip->i_flush);
 
 	mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
 		     "xfsino", ip->i_ino);
Index: xfs/fs/xfs/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.c	2011-12-18 08:05:01.519974559 -0800
+++ xfs/fs/xfs/xfs_sync.c	2011-12-18 08:06:23.269974071 -0800
@@ -671,14 +671,13 @@ xfs_reclaim_inode_grab(
 		return 1;
 
 	/*
-	 * do some unlocked checks first to avoid unnecessary lock traffic.
-	 * The first is a flush lock check, the second is a already in reclaim
-	 * check. Only do these checks if we are not going to block on locks.
+	 * If we are asked for non-blocking operation, do unlocked checks to
+	 * see if the inode already is being flushed or in reclaim to avoid
+	 * lock traffic.
 	 */
 	if ((flags & SYNC_TRYLOCK) &&
-	    (!ip->i_flush.done || __xfs_iflags_test(ip, XFS_IRECLAIM))) {
+	    __xfs_iflags_test(ip, XFS_IFLOCK | XFS_IRECLAIM))
 		return 1;
-	}
 
 	/*
 	 * The radix tree lock here protects a thread in xfs_iget from racing

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-13 22:42   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode Christoph Hellwig
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-i_ipin_wait --]
[-- Type: text/plain, Size: 4132 bytes --]

Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue.  This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.

Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_inode.c      |   27 +++++++++++++++++++++------
 fs/xfs/xfs_inode.h      |    3 ++-
 fs/xfs/xfs_inode_item.c |    2 +-
 fs/xfs/xfs_super.c      |    1 -
 4 files changed, 24 insertions(+), 9 deletions(-)

Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-11-30 12:59:08.843047205 +0100
+++ xfs/fs/xfs/xfs_inode.c	2011-11-30 12:59:10.296372665 +0100
@@ -2037,7 +2037,7 @@ xfs_idestroy_fork(
  * once someone is waiting for it to be unpinned.
  */
 static void
-xfs_iunpin_nowait(
+xfs_iunpin(
 	struct xfs_inode	*ip)
 {
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
@@ -2049,14 +2049,29 @@ xfs_iunpin_nowait(
 
 }
 
+static void
+__xfs_iunpin_wait(
+	struct xfs_inode	*ip)
+{
+	wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IPINNED_BIT);
+	DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IPINNED_BIT);
+
+	xfs_iunpin(ip);
+
+	do {
+		prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+		if (xfs_ipincount(ip))
+			io_schedule();
+	} while (xfs_ipincount(ip));
+	finish_wait(wq, &wait.wait);
+}
+
 void
 xfs_iunpin_wait(
 	struct xfs_inode	*ip)
 {
-	if (xfs_ipincount(ip)) {
-		xfs_iunpin_nowait(ip);
-		wait_event(ip->i_ipin_wait, (xfs_ipincount(ip) == 0));
-	}
+	if (xfs_ipincount(ip))
+		__xfs_iunpin_wait(ip);
 }
 
 /*
@@ -2415,7 +2430,7 @@ xfs_iflush(
 	 * out for us if they occur after the log force completes.
 	 */
 	if (!(flags & SYNC_WAIT) && xfs_ipincount(ip)) {
-		xfs_iunpin_nowait(ip);
+		xfs_iunpin(ip);
 		xfs_ifunlock(ip);
 		return EAGAIN;
 	}
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:59:08.846380520 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-11-30 12:59:10.296372665 +0100
@@ -238,7 +238,6 @@ typedef struct xfs_inode {
 	mrlock_t		i_lock;		/* inode lock */
 	mrlock_t		i_iolock;	/* inode IO lock */
 	atomic_t		i_pincount;	/* inode pin count */
-	wait_queue_head_t	i_ipin_wait;	/* inode pinning wait queue */
 	spinlock_t		i_flags_lock;	/* inode i_flags lock */
 	/* Miscellaneous state. */
 	unsigned long		i_flags;	/* see defined flags below */
@@ -367,6 +366,8 @@ xfs_set_projid(struct xfs_inode *ip,
 #define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
 #define __XFS_IFLOCK_BIT	7	 /* inode is being flushed right now */
 #define XFS_IFLOCK		(1 << __XFS_IFLOCK_BIT)
+#define __XFS_IPINNED_BIT	8	 /* wakeup key for zero pin count */
+#define XFS_IPINNED		(1 << __XFS_IPINNED_BIT)
 
 /*
  * Per-lifetime flags need to be reset when re-using a reclaimable inode during
Index: xfs/fs/xfs/xfs_inode_item.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode_item.c	2011-11-30 12:59:08.846380520 +0100
+++ xfs/fs/xfs/xfs_inode_item.c	2011-11-30 12:59:10.296372665 +0100
@@ -555,7 +555,7 @@ xfs_inode_item_unpin(
 	trace_xfs_inode_unpin(ip, _RET_IP_);
 	ASSERT(atomic_read(&ip->i_pincount) > 0);
 	if (atomic_dec_and_test(&ip->i_pincount))
-		wake_up(&ip->i_ipin_wait);
+		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
 }
 
 /*
Index: xfs/fs/xfs/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/xfs_super.c	2011-11-30 12:59:08.846380520 +0100
+++ xfs/fs/xfs/xfs_super.c	2011-11-30 12:59:10.296372665 +0100
@@ -828,7 +828,6 @@ xfs_fs_inode_init_once(
 	/* xfs inode */
 	atomic_set(&ip->i_pincount, 0);
 	spin_lock_init(&ip->i_flags_lock);
-	init_waitqueue_head(&ip->i_ipin_wait);
 
 	mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
 		     "xfsino", ip->i_ino);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-16 18:32   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-i_size --]
[-- Type: text/plain, Size: 18089 bytes --]

There is no fundamental need to keep an in-memory inode size copy in the XFS
inode.  We already have the on-disk value in the dinode, and the separate
in-memory copy that we need for regular files only in the XFS inode.

Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
VFS inode i_size field for regular fields.  Switch code that was directly
accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
we are limited to regular files direct access of the VFS inode i_size field.

This also allows dropping some fairly complicated code in the write path
which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
that is getting updated inside ->write_end.

Note that we do not bother resetting the VFS i_size when truncating a file
that gets freed to zero as there is point in doing so because the VFS inode
is no longer in use at this point.  Just relax the assert in xfs_ifree to
only check the on-disk size instead.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_aops.c        |    2 +-
 fs/xfs/xfs_bmap.c        |   15 ++++++---------
 fs/xfs/xfs_file.c        |   45 +++++++++++----------------------------------
 fs/xfs/xfs_fs_subr.c     |    2 +-
 fs/xfs/xfs_iget.c        |    1 -
 fs/xfs/xfs_inode.c       |    8 ++------
 fs/xfs/xfs_inode.h       |   16 ++++++++++++----
 fs/xfs/xfs_iomap.c       |   12 ++++++------
 fs/xfs/xfs_iops.c        |    3 +--
 fs/xfs/xfs_qm_syscalls.c |    1 -
 fs/xfs/xfs_trace.h       |    2 +-
 fs/xfs/xfs_vnodeops.c    |   31 +++++++++++++++----------------
 12 files changed, 56 insertions(+), 82 deletions(-)

Index: xfs/fs/xfs/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_aops.c	2011-12-02 19:39:30.743827732 +0100
+++ xfs/fs/xfs/xfs_aops.c	2011-12-07 11:18:04.305981907 +0100
@@ -111,7 +111,7 @@ xfs_ioend_new_eof(
 	xfs_fsize_t		bsize;
 
 	bsize = ioend->io_offset + ioend->io_size;
-	isize = MAX(ip->i_size, ip->i_new_size);
+	isize = MAX(i_size_read(VFS_I(ip)), ip->i_new_size);
 	isize = MIN(isize, bsize);
 	return isize > ip->i_d.di_size ? isize : 0;
 }
Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c	2011-12-02 19:39:30.753827732 +0100
+++ xfs/fs/xfs/xfs_file.c	2011-12-07 11:18:04.305981907 +0100
@@ -327,7 +327,7 @@ xfs_file_aio_read(
 				mp->m_rtdev_targp : mp->m_ddev_targp;
 		if ((iocb->ki_pos & target->bt_smask) ||
 		    (size & target->bt_smask)) {
-			if (iocb->ki_pos == ip->i_size)
+			if (iocb->ki_pos == i_size_read(inode))
 				return 0;
 			return -XFS_ERROR(EINVAL);
 		}
@@ -412,30 +412,6 @@ xfs_file_splice_read(
 	return ret;
 }
 
-STATIC void
-xfs_aio_write_isize_update(
-	struct inode	*inode,
-	loff_t		*ppos,
-	ssize_t		bytes_written)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	xfs_fsize_t		isize = i_size_read(inode);
-
-	if (bytes_written > 0)
-		XFS_STATS_ADD(xs_write_bytes, bytes_written);
-
-	if (unlikely(bytes_written < 0 && bytes_written != -EFAULT &&
-					*ppos > isize))
-		*ppos = isize;
-
-	if (*ppos > ip->i_size) {
-		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
-		if (*ppos > ip->i_size)
-			ip->i_size = *ppos;
-		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-	}
-}
-
 /*
  * If this was a direct or synchronous I/O that failed (such as ENOSPC) then
  * part of the I/O may have been written to disk before the error occurred.  In
@@ -451,8 +427,8 @@ xfs_aio_write_newsize_update(
 		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
 		if (new_size == ip->i_new_size)
 			ip->i_new_size = 0;
-		if (ip->i_d.di_size > ip->i_size)
-			ip->i_d.di_size = ip->i_size;
+		if (ip->i_d.di_size > i_size_read(VFS_I(ip)))
+			ip->i_d.di_size = i_size_read(VFS_I(ip));
 		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 	}
 }
@@ -492,15 +468,16 @@ xfs_file_splice_write(
 	new_size = *ppos + count;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	if (new_size > ip->i_size)
+	if (new_size > i_size_read(inode))
 		ip->i_new_size = new_size;
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
 	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
+	if (ret > 0)
+		XFS_STATS_ADD(xs_write_bytes, ret);
 
-	xfs_aio_write_isize_update(inode, ppos, ret);
 	xfs_aio_write_newsize_update(ip, new_size);
 	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 	return ret;
@@ -728,14 +705,14 @@ restart:
 	 * values are still valid.
 	 */
 	if ((ip->i_new_size && *pos > ip->i_new_size) ||
-	    (!ip->i_new_size && *pos > ip->i_size)) {
+	    (!ip->i_new_size && *pos > i_size_read(inode))) {
 		if (*iolock == XFS_IOLOCK_SHARED) {
 			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
 			*iolock = XFS_IOLOCK_EXCL;
 			xfs_rw_ilock(ip, XFS_ILOCK_EXCL | *iolock);
 			goto restart;
 		}
-		error = -xfs_zero_eof(ip, *pos, ip->i_size);
+		error = -xfs_zero_eof(ip, *pos, i_size_read(inode));
 	}
 
 	/*
@@ -744,7 +721,7 @@ restart:
 	 * ip->i_new_size if this IO ends beyond any other in-flight writes.
 	 */
 	new_size = *pos + *count;
-	if (new_size > ip->i_size) {
+	if (new_size > i_size_read(inode)) {
 		if (new_size > ip->i_new_size)
 			ip->i_new_size = new_size;
 		*new_sizep = new_size;
@@ -957,11 +934,11 @@ xfs_file_aio_write(
 		ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
 						ocount, &new_size, &iolock);
 
-	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
-
 	if (ret <= 0)
 		goto out_unlock;
 
+	XFS_STATS_ADD(xs_write_bytes, ret);
+
 	/* Handle various SYNC-type writes */
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 		loff_t end = pos + ret - 1;
Index: xfs/fs/xfs/xfs_fs_subr.c
===================================================================
--- xfs.orig/fs/xfs/xfs_fs_subr.c	2011-12-02 19:39:30.763827732 +0100
+++ xfs/fs/xfs/xfs_fs_subr.c	2011-12-07 11:18:04.305981907 +0100
@@ -90,7 +90,7 @@ xfs_wait_on_pages(
 
 	if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
 		return -filemap_fdatawait_range(mapping, first,
-					last == -1 ? ip->i_size - 1 : last);
+					last == -1 ? XFS_ISIZE(ip) - 1 : last);
 	}
 	return 0;
 }
Index: xfs/fs/xfs/xfs_iops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iops.c	2011-12-07 11:15:33.600131684 +0100
+++ xfs/fs/xfs/xfs_iops.c	2011-12-07 11:18:04.309315222 +0100
@@ -778,7 +778,7 @@ xfs_setattr_size(
 		lock_flags |= XFS_IOLOCK_EXCL;
 	xfs_ilock(ip, lock_flags);
 
-	oldsize = ip->i_size;
+	oldsize = inode->i_size;
 	newsize = iattr->ia_size;
 
 	/*
@@ -897,7 +897,6 @@ xfs_setattr_size(
 	 * they get written to.
 	 */
 	ip->i_d.di_size = newsize;
-	ip->i_size = newsize;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	if (newsize <= oldsize) {
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h	2011-12-07 11:17:02.346317572 +0100
+++ xfs/fs/xfs/xfs_trace.h	2011-12-07 11:18:04.309315222 +0100
@@ -1038,7 +1038,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 	TP_fast_assign(
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
 		__entry->ino = ip->i_ino;
-		__entry->isize = ip->i_size;
+		__entry->isize = VFS_I(ip)->i_size;
 		__entry->disize = ip->i_d.di_size;
 		__entry->new_size = ip->i_new_size;
 		__entry->offset = offset;
Index: xfs/fs/xfs/xfs_bmap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.c	2011-12-07 11:17:02.339650941 +0100
+++ xfs/fs/xfs/xfs_bmap.c	2011-12-07 11:18:04.312648538 +0100
@@ -3997,11 +3997,8 @@ xfs_bmap_one_block(
 	xfs_bmbt_irec_t	s;		/* internal version of extent */
 
 #ifndef DEBUG
-	if (whichfork == XFS_DATA_FORK) {
-		return S_ISREG(ip->i_d.di_mode) ?
-			(ip->i_size == ip->i_mount->m_sb.sb_blocksize) :
-			(ip->i_d.di_size == ip->i_mount->m_sb.sb_blocksize);
-	}
+	if (whichfork == XFS_DATA_FORK)
+		return XFS_ISIZE(ip) == ip->i_mount->m_sb.sb_blocksize;
 #endif	/* !DEBUG */
 	if (XFS_IFORK_NEXTENTS(ip, whichfork) != 1)
 		return 0;
@@ -4013,7 +4010,7 @@ xfs_bmap_one_block(
 	xfs_bmbt_get_all(ep, &s);
 	rval = s.br_startoff == 0 && s.br_blockcount == 1;
 	if (rval && whichfork == XFS_DATA_FORK)
-		ASSERT(ip->i_size == ip->i_mount->m_sb.sb_blocksize);
+		ASSERT(XFS_ISIZE(ip) == ip->i_mount->m_sb.sb_blocksize);
 	return rval;
 }
 
@@ -5425,7 +5422,7 @@ xfs_getbmapx_fix_eof_hole(
 	if (startblock == HOLESTARTBLOCK) {
 		mp = ip->i_mount;
 		out->bmv_block = -1;
-		fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, ip->i_size));
+		fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, XFS_ISIZE(ip)));
 		fixlen -= out->bmv_offset;
 		if (prealloced && out->bmv_offset + out->bmv_length == end) {
 			/* Came to hole at EOF. Trim it. */
@@ -5513,7 +5510,7 @@ xfs_getbmap(
 			fixlen = XFS_MAXIOFFSET(mp);
 		} else {
 			prealloced = 0;
-			fixlen = ip->i_size;
+			fixlen = XFS_ISIZE(ip);
 		}
 	}
 
@@ -5542,7 +5539,7 @@ xfs_getbmap(
 
 	xfs_ilock(ip, XFS_IOLOCK_SHARED);
 	if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) {
-		if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
+		if (ip->i_delayed_blks || XFS_ISIZE(ip) > ip->i_d.di_size) {
 			error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
 			if (error)
 				goto out_unlock_iolock;
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c	2011-12-07 11:17:37.032796326 +0100
+++ xfs/fs/xfs/xfs_iget.c	2011-12-07 11:18:04.312648538 +0100
@@ -94,7 +94,6 @@ xfs_inode_alloc(
 	ip->i_update_core = 0;
 	ip->i_delayed_blks = 0;
 	memset(&ip->i_d, 0, sizeof(xfs_icdinode_t));
-	ip->i_size = 0;
 	ip->i_new_size = 0;
 
 	return ip;
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-12-07 11:17:50.642722594 +0100
+++ xfs/fs/xfs/xfs_inode.c	2011-12-07 11:18:04.315981854 +0100
@@ -347,7 +347,6 @@ xfs_iformat(
 			return XFS_ERROR(EFSCORRUPTED);
 		}
 		ip->i_d.di_size = 0;
-		ip->i_size = 0;
 		ip->i_df.if_u2.if_rdev = xfs_dinode_get_rdev(dip);
 		break;
 
@@ -853,7 +852,6 @@ xfs_iread(
 	}
 
 	ip->i_delayed_blks = 0;
-	ip->i_size = ip->i_d.di_size;
 
 	/*
 	 * Mark the buffer containing the inode as something to keep
@@ -1043,7 +1041,6 @@ xfs_ialloc(
 	}
 
 	ip->i_d.di_size = 0;
-	ip->i_size = 0;
 	ip->i_d.di_nextents = 0;
 	ASSERT(ip->i_d.di_nblocks == 0);
 
@@ -1198,7 +1195,7 @@ xfs_itruncate_extents(
 	int			done = 0;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL));
-	ASSERT(new_size <= ip->i_size);
+	ASSERT(new_size <= XFS_ISIZE(ip));
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(ip->i_itemp != NULL);
 	ASSERT(ip->i_itemp->ili_lock_flags == 0);
@@ -1712,8 +1709,7 @@ xfs_ifree(
 	ASSERT(ip->i_d.di_nlink == 0);
 	ASSERT(ip->i_d.di_nextents == 0);
 	ASSERT(ip->i_d.di_anextents == 0);
-	ASSERT((ip->i_d.di_size == 0 && ip->i_size == 0) ||
-	       (!S_ISREG(ip->i_d.di_mode)));
+	ASSERT(ip->i_d.di_size == 0 || !S_ISREG(ip->i_d.di_mode));
 	ASSERT(ip->i_d.di_nblocks == 0);
 
 	/*
Index: xfs/fs/xfs/xfs_iomap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iomap.c	2011-12-07 11:15:53.693356164 +0100
+++ xfs/fs/xfs/xfs_iomap.c	2011-12-07 11:18:04.315981854 +0100
@@ -74,7 +74,7 @@ xfs_iomap_eof_align_last_fsb(
 		else if (mp->m_dalign)
 			align = mp->m_dalign;
 
-		if (align && ip->i_size >= XFS_FSB_TO_B(mp, align))
+		if (align && XFS_ISIZE(ip) >= XFS_FSB_TO_B(mp, align))
 			new_last_fsb = roundup_64(*last_fsb, align);
 	}
 
@@ -154,7 +154,7 @@ xfs_iomap_write_direct(
 
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
-	if ((offset + count) > ip->i_size) {
+	if ((offset + count) > XFS_ISIZE(ip)) {
 		error = xfs_iomap_eof_align_last_fsb(mp, ip, extsz, &last_fsb);
 		if (error)
 			goto error_out;
@@ -211,7 +211,7 @@ xfs_iomap_write_direct(
 	xfs_trans_ijoin(tp, ip, 0);
 
 	bmapi_flag = 0;
-	if (offset < ip->i_size || extsz)
+	if (offset < XFS_ISIZE(ip) || extsz)
 		bmapi_flag |= XFS_BMAPI_PREALLOC;
 
 	/*
@@ -286,7 +286,7 @@ xfs_iomap_eof_want_preallocate(
 	int		found_delalloc = 0;
 
 	*prealloc = 0;
-	if ((offset + count) <= ip->i_size)
+	if (offset + count <= XFS_ISIZE(ip))
 		return 0;
 
 	/*
@@ -340,7 +340,7 @@ xfs_iomap_prealloc_size(
 		 * if we pass in alloc_blocks = 0. Hence the "+ 1" to
 		 * ensure we always pass in a non-zero value.
 		 */
-		alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size) + 1;
+		alloc_blocks = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)) + 1;
 		alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
 					rounddown_pow_of_two(alloc_blocks));
 
@@ -564,7 +564,7 @@ xfs_iomap_write_allocate(
 			 * back....
 			 */
 			nimaps = 1;
-			end_fsb = XFS_B_TO_FSB(mp, ip->i_size);
+			end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
 			error = xfs_bmap_last_offset(NULL, ip, &last_block,
 							XFS_DATA_FORK);
 			if (error)
Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-12-07 11:15:33.603464999 +0100
+++ xfs/fs/xfs/xfs_vnodeops.c	2011-12-07 11:18:04.315981854 +0100
@@ -175,7 +175,7 @@ xfs_free_eofblocks(
 	 * Figure out if there are any blocks beyond the end
 	 * of the file.  If not, then there is nothing to do.
 	 */
-	end_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)ip->i_size));
+	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
 	last_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_MAXIOFFSET(mp));
 	if (last_fsb <= end_fsb)
 		return 0;
@@ -233,7 +233,7 @@ xfs_free_eofblocks(
 		 * may be full of holes (ie NULL files bug).
 		 */
 		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
-					      ip->i_size);
+					      XFS_ISIZE(ip));
 		if (error) {
 			/*
 			 * If we get an error at this point we simply don't
@@ -547,8 +547,8 @@ xfs_release(
 		return 0;
 
 	if ((S_ISREG(ip->i_d.di_mode) &&
-	     ((ip->i_size > 0) || (VN_CACHED(VFS_I(ip)) > 0 ||
-	       ip->i_delayed_blks > 0)) &&
+	     (VFS_I(ip)->i_size > 0 ||
+	      (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) &&
 	     (ip->i_df.if_flags & XFS_IFEXTENTS))  &&
 	    (!(ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))) {
 
@@ -625,7 +625,7 @@ xfs_inactive(
 	 * only one with a reference to the inode.
 	 */
 	truncate = ((ip->i_d.di_nlink == 0) &&
-	    ((ip->i_d.di_size != 0) || (ip->i_size != 0) ||
+	    ((ip->i_d.di_size != 0) || XFS_ISIZE(ip) != 0 ||
 	     (ip->i_d.di_nextents > 0) || (ip->i_delayed_blks > 0)) &&
 	    S_ISREG(ip->i_d.di_mode));
 
@@ -639,12 +639,12 @@ xfs_inactive(
 
 	if (ip->i_d.di_nlink != 0) {
 		if ((S_ISREG(ip->i_d.di_mode) &&
-                     ((ip->i_size > 0) || (VN_CACHED(VFS_I(ip)) > 0 ||
-                       ip->i_delayed_blks > 0)) &&
-		      (ip->i_df.if_flags & XFS_IFEXTENTS) &&
-		     (!(ip->i_d.di_flags &
+		    (VFS_I(ip)->i_size > 0 ||
+		     (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) &&
+		    (ip->i_df.if_flags & XFS_IFEXTENTS) &&
+		    (!(ip->i_d.di_flags &
 				(XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) ||
-		      (ip->i_delayed_blks != 0)))) {
+		     ip->i_delayed_blks != 0))) {
 			error = xfs_free_eofblocks(mp, ip, 0);
 			if (error)
 				return VN_INACTIVE_CACHE;
@@ -678,7 +678,6 @@ xfs_inactive(
 		xfs_trans_ijoin(tp, ip, 0);
 
 		ip->i_d.di_size = 0;
-		ip->i_size = 0;
 		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
@@ -1974,11 +1973,11 @@ xfs_zero_remaining_bytes(
 	 * since nothing can read beyond eof.  The space will
 	 * be zeroed when the file is extended anyway.
 	 */
-	if (startoff >= ip->i_size)
+	if (startoff >= XFS_ISIZE(ip))
 		return 0;
 
-	if (endoff > ip->i_size)
-		endoff = ip->i_size;
+	if (endoff > XFS_ISIZE(ip))
+		endoff = XFS_ISIZE(ip);
 
 	bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp,
@@ -2273,7 +2272,7 @@ xfs_change_file_space(
 		bf->l_start += offset;
 		break;
 	case 2: /*SEEK_END*/
-		bf->l_start += ip->i_size;
+		bf->l_start += XFS_ISIZE(ip);
 		break;
 	default:
 		return XFS_ERROR(EINVAL);
@@ -2290,7 +2289,7 @@ xfs_change_file_space(
 	bf->l_whence = 0;
 
 	startoffset = bf->l_start;
-	fsize = ip->i_size;
+	fsize = XFS_ISIZE(ip);
 
 	/*
 	 * XFS_IOC_RESVSP and XFS_IOC_UNRESVSP will reserve or unreserve
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-12-07 11:17:50.642722594 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-12-07 11:18:04.319315170 +0100
@@ -246,16 +246,12 @@ typedef struct xfs_inode {
 
 	xfs_icdinode_t		i_d;		/* most of ondisk inode */
 
-	xfs_fsize_t		i_size;		/* in-memory size */
 	xfs_fsize_t		i_new_size;	/* size when write completes */
 
 	/* VFS inode */
 	struct inode		i_vnode;	/* embedded VFS inode */
 } xfs_inode_t;
 
-#define XFS_ISIZE(ip)	S_ISREG((ip)->i_d.di_mode) ? \
-				(ip)->i_size : (ip)->i_d.di_size;
-
 /* Convert from vfs inode to xfs inode */
 static inline struct xfs_inode *XFS_I(struct inode *inode)
 {
@@ -269,6 +265,18 @@ static inline struct inode *VFS_I(struct
 }
 
 /*
+ * For regular files we only update the on-disk filesize when actually
+ * writing data back to disk.  Until then only the copy in the VFS inode
+ * is uptodate.
+ */
+static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
+{
+	if (S_ISREG(ip->i_d.di_mode))
+		return i_size_read(VFS_I(ip));
+	return ip->i_d.di_size;
+}
+
+/*
  * i_flags helper functions
  */
 static inline void
Index: xfs/fs/xfs/xfs_qm_syscalls.c
===================================================================
--- xfs.orig/fs/xfs/xfs_qm_syscalls.c	2011-12-07 11:15:33.600131684 +0100
+++ xfs/fs/xfs/xfs_qm_syscalls.c	2011-12-07 11:18:04.319315170 +0100
@@ -265,7 +265,6 @@ xfs_qm_scall_trunc_qfile(
 	xfs_trans_ijoin(tp, ip, 0);
 
 	ip->i_d.di_size = 0;
-	ip->i_size = 0;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 09/11] xfs: remove the i_new_size field in struct xfs_inode
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2011-12-18 22:13   ` Dave Chinner
                     ` (2 more replies)
  2011-12-18 20:00 ` [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks Christoph Hellwig
  2011-12-18 20:00 ` [PATCH 11/11] xfs: cleanup xfs_file_aio_write Christoph Hellwig
  10 siblings, 3 replies; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-kill-i_new_size --]
[-- Type: text/plain, Size: 13732 bytes --]

Now that we use the VFS i_size field throughout XFS there is no need for the
i_new_size field any more given that the VFS i_size field gets updated
in ->write_end before unlocking the page, and thus is a) always uptodate when
writeback could see a page.  Removing i_new_size also has the advantage that
we will never have to trim back di_size during a failed buffered write,
given that it never gets updated past i_size.

Note that currently the generic direct I/O code only updates i_size after
calling our end_io handler, which requires a small workaround to make
sure di_size actually makes it to disk.  I hope to fix this properly in
the generic code.

A downside is that we lose the support for parallel non-overlapping O_DIRECT
appending writes that recently was added.  I don't think keeping the complex
and fragile i_new_size infrastructure for this is a good tradeoff - if we
really care about parallel appending writers we should investigate turning
the iolock into a range lock, which would also allow for parallel
non-overlapping buffered writers.

Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_aops.c  |   28 +++++++++++---------
 fs/xfs/xfs_file.c  |   72 +++++++----------------------------------------------
 fs/xfs/xfs_iget.c  |    1 
 fs/xfs/xfs_inode.h |    2 -
 fs/xfs/xfs_trace.h |   18 ++-----------
 5 files changed, 29 insertions(+), 92 deletions(-)

Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c	2011-11-30 12:59:11.669698558 +0100
+++ xfs/fs/xfs/xfs_file.c	2011-11-30 12:59:13.533021797 +0100
@@ -413,27 +413,6 @@ xfs_file_splice_read(
 }
 
 /*
- * If this was a direct or synchronous I/O that failed (such as ENOSPC) then
- * part of the I/O may have been written to disk before the error occurred.  In
- * this case the on-disk file size may have been adjusted beyond the in-memory
- * file size and now needs to be truncated back.
- */
-STATIC void
-xfs_aio_write_newsize_update(
-	struct xfs_inode	*ip,
-	xfs_fsize_t		new_size)
-{
-	if (new_size == ip->i_new_size) {
-		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
-		if (new_size == ip->i_new_size)
-			ip->i_new_size = 0;
-		if (ip->i_d.di_size > i_size_read(VFS_I(ip)))
-			ip->i_d.di_size = i_size_read(VFS_I(ip));
-		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-	}
-}
-
-/*
  * xfs_file_splice_write() does not use xfs_rw_ilock() because
  * generic_file_splice_write() takes the i_mutex itself. This, in theory,
  * couuld cause lock inversions between the aio_write path and the splice path
@@ -451,7 +430,6 @@ xfs_file_splice_write(
 {
 	struct inode		*inode = outfilp->f_mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	xfs_fsize_t		new_size;
 	int			ioflags = 0;
 	ssize_t			ret;
 
@@ -465,20 +443,12 @@ xfs_file_splice_write(
 
 	xfs_ilock(ip, XFS_IOLOCK_EXCL);
 
-	new_size = *ppos + count;
-
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	if (new_size > i_size_read(inode))
-		ip->i_new_size = new_size;
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
 	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
 	if (ret > 0)
 		XFS_STATS_ADD(xs_write_bytes, ret);
 
-	xfs_aio_write_newsize_update(ip, new_size);
 	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 	return ret;
 }
@@ -673,16 +643,13 @@ xfs_file_aio_write_checks(
 	struct file		*file,
 	loff_t			*pos,
 	size_t			*count,
-	xfs_fsize_t		*new_sizep,
 	int			*iolock)
 {
 	struct inode		*inode = file->f_mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	xfs_fsize_t		new_size;
 	int			error = 0;
 
 	xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
-	*new_sizep = 0;
 restart:
 	error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
 	if (error) {
@@ -697,15 +664,13 @@ restart:
 	/*
 	 * If the offset is beyond the size of the file, we need to zero any
 	 * blocks that fall between the existing EOF and the start of this
-	 * write. There is no need to issue zeroing if another in-flght IO ends
-	 * at or before this one If zeronig is needed and we are currently
-	 * holding the iolock shared, we need to update it to exclusive which
-	 * involves dropping all locks and relocking to maintain correct locking
-	 * order. If we do this, restart the function to ensure all checks and
-	 * values are still valid.
+	 * write.  If zeroing is needed and we are currently holding the
+	 * iolock shared, we need to update it to exclusive which involves
+	 * dropping all locks and relocking to maintain correct locking order.
+	 * If we do this, restart the function to ensure all checks and values
+	 * are still valid.
 	 */
-	if ((ip->i_new_size && *pos > ip->i_new_size) ||
-	    (!ip->i_new_size && *pos > i_size_read(inode))) {
+	if (*pos > i_size_read(inode)) {
 		if (*iolock == XFS_IOLOCK_SHARED) {
 			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
 			*iolock = XFS_IOLOCK_EXCL;
@@ -714,19 +679,6 @@ restart:
 		}
 		error = -xfs_zero_eof(ip, *pos, i_size_read(inode));
 	}
-
-	/*
-	 * If this IO extends beyond EOF, we may need to update ip->i_new_size.
-	 * We have already zeroed space beyond EOF (if necessary).  Only update
-	 * ip->i_new_size if this IO ends beyond any other in-flight writes.
-	 */
-	new_size = *pos + *count;
-	if (new_size > i_size_read(inode)) {
-		if (new_size > ip->i_new_size)
-			ip->i_new_size = new_size;
-		*new_sizep = new_size;
-	}
-
 	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 	if (error)
 		return error;
@@ -772,7 +724,6 @@ xfs_file_dio_aio_write(
 	unsigned long		nr_segs,
 	loff_t			pos,
 	size_t			ocount,
-	xfs_fsize_t		*new_size,
 	int			*iolock)
 {
 	struct file		*file = iocb->ki_filp;
@@ -817,7 +768,7 @@ xfs_file_dio_aio_write(
 		xfs_rw_ilock(ip, *iolock);
 	}
 
-	ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
+	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
 	if (ret)
 		return ret;
 
@@ -855,7 +806,6 @@ xfs_file_buffered_aio_write(
 	unsigned long		nr_segs,
 	loff_t			pos,
 	size_t			ocount,
-	xfs_fsize_t		*new_size,
 	int			*iolock)
 {
 	struct file		*file = iocb->ki_filp;
@@ -869,7 +819,7 @@ xfs_file_buffered_aio_write(
 	*iolock = XFS_IOLOCK_EXCL;
 	xfs_rw_ilock(ip, *iolock);
 
-	ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
+	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
 	if (ret)
 		return ret;
 
@@ -909,7 +859,6 @@ xfs_file_aio_write(
 	ssize_t			ret;
 	int			iolock;
 	size_t			ocount = 0;
-	xfs_fsize_t		new_size = 0;
 
 	XFS_STATS_INC(xs_write_calls);
 
@@ -929,10 +878,10 @@ xfs_file_aio_write(
 
 	if (unlikely(file->f_flags & O_DIRECT))
 		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
-						ocount, &new_size, &iolock);
+						ocount, &iolock);
 	else
 		ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
-						ocount, &new_size, &iolock);
+						ocount, &iolock);
 
 	if (ret <= 0)
 		goto out_unlock;
@@ -953,7 +902,6 @@ xfs_file_aio_write(
 	}
 
 out_unlock:
-	xfs_aio_write_newsize_update(ip, new_size);
 	xfs_rw_iunlock(ip, iolock);
 	return ret;
 }
Index: xfs/fs/xfs/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_aops.c	2011-11-30 12:59:11.669698558 +0100
+++ xfs/fs/xfs/xfs_aops.c	2011-12-01 08:12:10.946664057 +0100
@@ -111,8 +111,7 @@ xfs_ioend_new_eof(
 	xfs_fsize_t		bsize;
 
 	bsize = ioend->io_offset + ioend->io_size;
-	isize = MAX(i_size_read(VFS_I(ip)), ip->i_new_size);
-	isize = MIN(isize, bsize);
+	isize = MIN(i_size_read(VFS_I(ip)), bsize);
 	return isize > ip->i_d.di_size ? isize : 0;
 }
 
@@ -126,11 +125,7 @@ static inline bool xfs_ioend_is_append(s
 }
 
 /*
- * Update on-disk file size now that data has been written to disk.  The
- * current in-memory file size is i_size.  If a write is beyond eof i_new_size
- * will be the intended file size until i_size is updated.  If this write does
- * not extend all the way to the valid file size then restrict this update to
- * the end of the write.
+ * Update on-disk file size now that data has been written to disk.
  *
  * This function does not block as blocking on the inode lock in IO completion
  * can lead to IO completion order dependency deadlocks.. If it can't get the
@@ -1279,6 +1274,15 @@ xfs_end_io_direct_write(
 	struct xfs_ioend	*ioend = iocb->private;
 
 	/*
+	 * While the generic direct I/O code updates the inode size, it does
+	 * so only after the end_io handler is called, which means our
+	 * end_io handler thinks the on-disk size is outside the in-core
+	 * size.  To prevent this just update it a little bit earlier here.
+	 */
+	if (offset + size > i_size_read(ioend->io_inode))
+		i_size_write(ioend->io_inode, offset + size);
+
+	/*
 	 * blockdev_direct_IO can return an error even after the I/O
 	 * completion handler was called.  Thus we need to protect
 	 * against double-freeing.
@@ -1340,12 +1344,10 @@ xfs_vm_write_failed(
 
 	if (to > inode->i_size) {
 		/*
-		 * punch out the delalloc blocks we have already allocated. We
-		 * don't call xfs_setattr() to do this as we may be in the
-		 * middle of a multi-iovec write and so the vfs inode->i_size
-		 * will not match the xfs ip->i_size and so it will zero too
-		 * much. Hence we jus truncate the page cache to zero what is
-		 * necessary and punch the delalloc blocks directly.
+		 * Punch out the delalloc blocks we have already allocated.
+		 *
+		 * Don't bother with xfs_setattr given that nothing can have
+		 * it do disk yet as the page is still locked at this point.
 		 */
 		struct xfs_inode	*ip = XFS_I(inode);
 		xfs_fileoff_t		start_fsb;
Index: xfs/fs/xfs/xfs_iget.c
===================================================================
--- xfs.orig/fs/xfs/xfs_iget.c	2011-11-30 12:59:11.676365190 +0100
+++ xfs/fs/xfs/xfs_iget.c	2011-11-30 12:59:13.533021797 +0100
@@ -94,7 +94,6 @@ xfs_inode_alloc(
 	ip->i_update_core = 0;
 	ip->i_delayed_blks = 0;
 	memset(&ip->i_d, 0, sizeof(xfs_icdinode_t));
-	ip->i_new_size = 0;
 
 	return ip;
 }
Index: xfs/fs/xfs/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/xfs_trace.h	2011-11-30 12:59:11.673031874 +0100
+++ xfs/fs/xfs/xfs_trace.h	2011-11-30 12:59:13.536355113 +0100
@@ -891,7 +891,6 @@ DECLARE_EVENT_CLASS(xfs_file_class,
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
 		__field(xfs_fsize_t, size)
-		__field(xfs_fsize_t, new_size)
 		__field(loff_t, offset)
 		__field(size_t, count)
 		__field(int, flags)
@@ -900,17 +899,15 @@ DECLARE_EVENT_CLASS(xfs_file_class,
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
 		__entry->ino = ip->i_ino;
 		__entry->size = ip->i_d.di_size;
-		__entry->new_size = ip->i_new_size;
 		__entry->offset = offset;
 		__entry->count = count;
 		__entry->flags = flags;
 	),
-	TP_printk("dev %d:%d ino 0x%llx size 0x%llx new_size 0x%llx "
+	TP_printk("dev %d:%d ino 0x%llx size 0x%llx "
 		  "offset 0x%llx count 0x%zx ioflags %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->size,
-		  __entry->new_size,
 		  __entry->offset,
 		  __entry->count,
 		  __print_flags(__entry->flags, "|", XFS_IO_FLAGS))
@@ -978,7 +975,6 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
 		__field(loff_t, size)
-		__field(loff_t, new_size)
 		__field(loff_t, offset)
 		__field(size_t, count)
 		__field(int, type)
@@ -990,7 +986,6 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
 		__entry->ino = ip->i_ino;
 		__entry->size = ip->i_d.di_size;
-		__entry->new_size = ip->i_new_size;
 		__entry->offset = offset;
 		__entry->count = count;
 		__entry->type = type;
@@ -998,13 +993,11 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
 		__entry->startblock = irec ? irec->br_startblock : 0;
 		__entry->blockcount = irec ? irec->br_blockcount : 0;
 	),
-	TP_printk("dev %d:%d ino 0x%llx size 0x%llx new_size 0x%llx "
-		  "offset 0x%llx count %zd type %s "
-		  "startoff 0x%llx startblock %lld blockcount 0x%llx",
+	TP_printk("dev %d:%d ino 0x%llx size 0x%llx offset 0x%llx count %zd "
+		  "type %s startoff 0x%llx startblock %lld blockcount 0x%llx",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->size,
-		  __entry->new_size,
 		  __entry->offset,
 		  __entry->count,
 		  __print_symbolic(__entry->type, XFS_IO_TYPES),
@@ -1031,7 +1024,6 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 		__field(xfs_ino_t, ino)
 		__field(loff_t, isize)
 		__field(loff_t, disize)
-		__field(loff_t, new_size)
 		__field(loff_t, offset)
 		__field(size_t, count)
 	),
@@ -1040,17 +1032,15 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
 		__entry->ino = ip->i_ino;
 		__entry->isize = VFS_I(ip)->i_size;
 		__entry->disize = ip->i_d.di_size;
-		__entry->new_size = ip->i_new_size;
 		__entry->offset = offset;
 		__entry->count = count;
 	),
-	TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx new_size 0x%llx "
+	TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx "
 		  "offset 0x%llx count %zd",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->isize,
 		  __entry->disize,
-		  __entry->new_size,
 		  __entry->offset,
 		  __entry->count)
 );
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:59:11.679698505 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-12-01 08:12:10.839997391 +0100
@@ -246,8 +246,6 @@ typedef struct xfs_inode {
 
 	xfs_icdinode_t		i_d;		/* most of ondisk inode */
 
-	xfs_fsize_t		i_new_size;	/* size when write completes */
-
 	/* VFS inode */
 	struct inode		i_vnode;	/* embedded VFS inode */
 } xfs_inode_t;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-17 20:18   ` Ben Myers
  2011-12-18 20:00 ` [PATCH 11/11] xfs: cleanup xfs_file_aio_write Christoph Hellwig
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-xfs_file_aio_write_checks --]
[-- Type: text/plain, Size: 1341 bytes --]

While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_file.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c	2011-12-07 12:46:31.343897882 +0100
+++ xfs/fs/xfs/xfs_file.c	2011-12-07 12:48:33.309903801 +0100
@@ -636,7 +636,9 @@ out_lock:
 /*
  * Common pre-write limit and setup checks.
  *
- * Returns with iolock held according to @iolock.
+ * Called with the iolocked held either shared and exclusive according to
+ * @iolock, and returns with it held.  Might upgrade the iolock to exclusive
+ * if called for a direct write beyond i_size.
  */
 STATIC ssize_t
 xfs_file_aio_write_checks(
@@ -653,8 +655,7 @@ xfs_file_aio_write_checks(
 restart:
 	error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
 	if (error) {
-		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
-		*iolock = 0;
+		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 		return error;
 	}
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 11/11] xfs: cleanup xfs_file_aio_write
  2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2011-12-18 20:00 ` [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks Christoph Hellwig
@ 2011-12-18 20:00 ` Christoph Hellwig
  2012-01-17 20:42   ` Ben Myers
  10 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2011-12-18 20:00 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-xfs_file_aio_write --]
[-- Type: text/plain, Size: 5682 bytes --]

With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

---
 fs/xfs/xfs_file.c |   82 ++++++++++++++++++++++++------------------------------
 1 file changed, 37 insertions(+), 45 deletions(-)

Index: xfs/fs/xfs/xfs_file.c
===================================================================
--- xfs.orig/fs/xfs/xfs_file.c	2011-12-07 12:48:33.309903801 +0100
+++ xfs/fs/xfs/xfs_file.c	2011-12-07 12:48:48.546487926 +0100
@@ -724,8 +724,7 @@ xfs_file_dio_aio_write(
 	const struct iovec	*iovp,
 	unsigned long		nr_segs,
 	loff_t			pos,
-	size_t			ocount,
-	int			*iolock)
+	size_t			ocount)
 {
 	struct file		*file = iocb->ki_filp;
 	struct address_space	*mapping = file->f_mapping;
@@ -735,10 +734,10 @@ xfs_file_dio_aio_write(
 	ssize_t			ret = 0;
 	size_t			count = ocount;
 	int			unaligned_io = 0;
+	int			iolock;
 	struct xfs_buftarg	*target = XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp;
 
-	*iolock = 0;
 	if ((pos & target->bt_smask) || (count & target->bt_smask))
 		return -XFS_ERROR(EINVAL);
 
@@ -753,31 +752,31 @@ xfs_file_dio_aio_write(
 	 * EOF zeroing cases and fill out the new inode size as appropriate.
 	 */
 	if (unaligned_io || mapping->nrpages)
-		*iolock = XFS_IOLOCK_EXCL;
+		iolock = XFS_IOLOCK_EXCL;
 	else
-		*iolock = XFS_IOLOCK_SHARED;
-	xfs_rw_ilock(ip, *iolock);
+		iolock = XFS_IOLOCK_SHARED;
+	xfs_rw_ilock(ip, iolock);
 
 	/*
 	 * Recheck if there are cached pages that need invalidate after we got
 	 * the iolock to protect against other threads adding new pages while
 	 * we were waiting for the iolock.
 	 */
-	if (mapping->nrpages && *iolock == XFS_IOLOCK_SHARED) {
-		xfs_rw_iunlock(ip, *iolock);
-		*iolock = XFS_IOLOCK_EXCL;
-		xfs_rw_ilock(ip, *iolock);
+	if (mapping->nrpages && iolock == XFS_IOLOCK_SHARED) {
+		xfs_rw_iunlock(ip, iolock);
+		iolock = XFS_IOLOCK_EXCL;
+		xfs_rw_ilock(ip, iolock);
 	}
 
-	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+	ret = xfs_file_aio_write_checks(file, &pos, &count, &iolock);
 	if (ret)
-		return ret;
+		goto out;
 
 	if (mapping->nrpages) {
 		ret = -xfs_flushinval_pages(ip, (pos & PAGE_CACHE_MASK), -1,
 							FI_REMAPF_LOCKED);
 		if (ret)
-			return ret;
+			goto out;
 	}
 
 	/*
@@ -786,15 +785,18 @@ xfs_file_dio_aio_write(
 	 */
 	if (unaligned_io)
 		inode_dio_wait(inode);
-	else if (*iolock == XFS_IOLOCK_EXCL) {
+	else if (iolock == XFS_IOLOCK_EXCL) {
 		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
-		*iolock = XFS_IOLOCK_SHARED;
+		iolock = XFS_IOLOCK_SHARED;
 	}
 
 	trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
 	ret = generic_file_direct_write(iocb, iovp,
 			&nr_segs, pos, &iocb->ki_pos, count, ocount);
 
+out:
+	xfs_rw_iunlock(ip, iolock);
+
 	/* No fallback to buffered IO on errors for XFS. */
 	ASSERT(ret < 0 || ret == count);
 	return ret;
@@ -806,8 +808,7 @@ xfs_file_buffered_aio_write(
 	const struct iovec	*iovp,
 	unsigned long		nr_segs,
 	loff_t			pos,
-	size_t			ocount,
-	int			*iolock)
+	size_t			ocount)
 {
 	struct file		*file = iocb->ki_filp;
 	struct address_space	*mapping = file->f_mapping;
@@ -815,14 +816,14 @@ xfs_file_buffered_aio_write(
 	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret;
 	int			enospc = 0;
+	int			iolock = XFS_IOLOCK_EXCL;
 	size_t			count = ocount;
 
-	*iolock = XFS_IOLOCK_EXCL;
-	xfs_rw_ilock(ip, *iolock);
+	xfs_rw_ilock(ip, iolock);
 
-	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+	ret = xfs_file_aio_write_checks(file, &pos, &count, &iolock);
 	if (ret)
-		return ret;
+		goto out;
 
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
@@ -836,13 +837,15 @@ write_retry:
 	 * page locks and retry *once*
 	 */
 	if (ret == -ENOSPC && !enospc) {
-		ret = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
-		if (ret)
-			return ret;
 		enospc = 1;
-		goto write_retry;
+		ret = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
+		if (!ret)
+			goto write_retry;
 	}
+
 	current->backing_dev_info = NULL;
+out:
+	xfs_rw_iunlock(ip, iolock);
 	return ret;
 }
 
@@ -858,7 +861,6 @@ xfs_file_aio_write(
 	struct inode		*inode = mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret;
-	int			iolock;
 	size_t			ocount = 0;
 
 	XFS_STATS_INC(xs_write_calls);
@@ -878,32 +880,22 @@ xfs_file_aio_write(
 		return -EIO;
 
 	if (unlikely(file->f_flags & O_DIRECT))
-		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
-						ocount, &iolock);
+		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos, ocount);
 	else
 		ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
-						ocount, &iolock);
+						  ocount);
 
-	if (ret <= 0)
-		goto out_unlock;
+	if (ret > 0) {
+		ssize_t err;
 
-	XFS_STATS_ADD(xs_write_bytes, ret);
+		XFS_STATS_ADD(xs_write_bytes, ret);
 
-	/* Handle various SYNC-type writes */
-	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
-		loff_t end = pos + ret - 1;
-		int error;
-
-		xfs_rw_iunlock(ip, iolock);
-		error = xfs_file_fsync(file, pos, end,
-				      (file->f_flags & __O_SYNC) ? 0 : 1);
-		xfs_rw_ilock(ip, iolock);
-		if (error)
-			ret = error;
+		/* Handle various SYNC-type writes */
+		err = generic_write_sync(file, pos, ret);
+		if (err < 0)
+			ret = err;
 	}
 
-out_unlock:
-	xfs_rw_iunlock(ip, iolock);
 	return ret;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/11] xfs: remove the i_new_size field in struct xfs_inode
  2011-12-18 20:00 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
@ 2011-12-18 22:13   ` Dave Chinner
  2012-01-16 22:41   ` Ben Myers
  2012-01-17 20:14   ` Ben Myers
  2 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2011-12-18 22:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:12PM -0500, Christoph Hellwig wrote:
> Now that we use the VFS i_size field throughout XFS there is no need for the
> i_new_size field any more given that the VFS i_size field gets updated
> in ->write_end before unlocking the page, and thus is a) always uptodate when
> writeback could see a page.  Removing i_new_size also has the advantage that
> we will never have to trim back di_size during a failed buffered write,
> given that it never gets updated past i_size.
> 
> Note that currently the generic direct I/O code only updates i_size after
> calling our end_io handler, which requires a small workaround to make
> sure di_size actually makes it to disk.  I hope to fix this properly in
> the generic code.
> 
> A downside is that we lose the support for parallel non-overlapping O_DIRECT
> appending writes that recently was added.  I don't think keeping the complex
> and fragile i_new_size infrastructure for this is a good tradeoff - if we
> really care about parallel appending writers we should investigate turning
> the iolock into a range lock, which would also allow for parallel
> non-overlapping buffered writers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/11] xfs: remove xfs_itruncate_data
  2011-12-18 20:00 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
@ 2012-01-03 21:53   ` Ben Myers
  2012-01-04  9:27     ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Myers @ 2012-01-03 21:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Hey Christoph,

On Sun, Dec 18, 2011 at 03:00:04PM -0500, Christoph Hellwig wrote:
> This wrapper isn't overly useful, not to say rather confusing.
> 
> Around the call to xfs_itruncate_extents it does:
> 
>  - add tracing
>  - add a few asserts in debug builds
>  - conditionally update the inode size in two places
>  - log the inode
> 
> Both the tracing and the inode logging can be moved to xfs_itruncate_extents
> as they are useful for the attribute fork as well - in fact the attr code
> already does an equivalent xfs_trans_log_inode call just after calling
> xfs_itruncate_extents.
>
> The conditional size updates are a mess, and there
> was no reason to do them in two places anyway, as the first one was
> conditional on the inode having extents - but without extents we
> xfs_itruncate_extents would be a no-op and the placement wouldn't matter
> anyway.
>
> Instead move the size assignments and the asserts that make sense
> to the callers that want it.
> 
> As a side effect of this clean up xfs_setattr_size by introducing variables
> for the old and new inode size, and moving the size updates into a common
> place.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.  I had some minor questions below.
Reviewed-by: Ben Myers <bpm@sgi.com>
 
> ---
>  fs/xfs/xfs_attr.c        |    4 -
>  fs/xfs/xfs_inode.c       |  124 +++--------------------------------------------
>  fs/xfs/xfs_inode.h       |    2 
>  fs/xfs/xfs_iops.c        |   47 +++++++++++------
>  fs/xfs/xfs_qm_syscalls.c |    9 +++
>  fs/xfs/xfs_trace.h       |    4 -
>  fs/xfs/xfs_vnodeops.c    |   17 +++++-
>  7 files changed, 65 insertions(+), 142 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_attr.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_attr.c	2011-11-30 12:58:07.820044461 +0100
> +++ xfs/fs/xfs/xfs_attr.c	2011-11-30 12:58:51.519807719 +0100
> @@ -827,10 +827,6 @@ xfs_attr_inactive(xfs_inode_t *dp)
>  	if (error)
>  		goto out;
>  
> -	/*
> -	 * Commit the last in the sequence of transactions.
> -	 */
> -	xfs_trans_log_inode(trans, dp, XFS_ILOG_CORE);
>  	error = xfs_trans_commit(trans, XFS_TRANS_RELEASE_LOG_RES);
>  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
>  
> Index: xfs/fs/xfs/xfs_inode.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.c	2011-11-30 12:58:07.830044408 +0100
> +++ xfs/fs/xfs/xfs_inode.c	2011-11-30 12:58:51.519807719 +0100
> @@ -1166,52 +1166,6 @@ xfs_ialloc(
>  }
>  
>  /*
> - * Check to make sure that there are no blocks allocated to the
> - * file beyond the size of the file.  We don't check this for
> - * files with fixed size extents or real time extents, but we
> - * at least do it for regular files.
> - */
> -#ifdef DEBUG
> -STATIC void
> -xfs_isize_check(
> -	struct xfs_inode	*ip,
> -	xfs_fsize_t		isize)
> -{
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_fileoff_t		map_first;
> -	int			nimaps;
> -	xfs_bmbt_irec_t		imaps[2];
> -	int			error;
> -
> -	if (!S_ISREG(ip->i_d.di_mode))
> -		return;
> -
> -	if (XFS_IS_REALTIME_INODE(ip))
> -		return;
> -
> -	if (ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE)
> -		return;
> -
> -	nimaps = 2;
> -	map_first = XFS_B_TO_FSB(mp, (xfs_ufsize_t)isize);
> -	/*
> -	 * The filesystem could be shutting down, so bmapi may return
> -	 * an error.
> -	 */
> -	error = xfs_bmapi_read(ip, map_first,
> -			 (XFS_B_TO_FSB(mp,
> -			       (xfs_ufsize_t)XFS_MAXIOFFSET(mp)) - map_first),
> -			 imaps, &nimaps, XFS_BMAPI_ENTIRE);
> -	if (error)
> -		return;
> -	ASSERT(nimaps == 1);
> -	ASSERT(imaps[0].br_startblock == HOLESTARTBLOCK);
> -}
> -#else	/* DEBUG */
> -#define xfs_isize_check(ip, isize)
> -#endif	/* DEBUG */
> -
> -/*

You've tossed xfs_isize_check in the round filer, but you didn't mention
that in your commit message.  Isn't this still useful code?

>   * Free up the underlying blocks past new_size.  The new size must be smaller
>   * than the current size.  This routine can be used both for the attribute and
>   * data fork, and does not modify the inode size, which is left to the caller.
> @@ -1258,6 +1212,8 @@ xfs_itruncate_extents(
>  	ASSERT(ip->i_itemp->ili_lock_flags == 0);
>  	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
>  
> +	trace_xfs_itruncate_extents_start(ip, new_size);
> +
>  	/*
>  	 * Since it is possible for space to become allocated beyond
>  	 * the end of the file (in a crash where the space is allocated
> @@ -1325,6 +1281,14 @@ xfs_itruncate_extents(
>  			goto out;
>  	}
>  
> +	/*
> +	 * Always re-log the inode so that our permanent transaction can keep
> +	 * on rolling it forward in the log.
> +	 */
> +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);

Oh..  This is why it's ok not to log the inode core in
xfs_attr_inactive:  it calls xfs_itruncate_extents()...

> +
> +	trace_xfs_itruncate_extents_end(ip, new_size);
> +
>  out:
>  	*tpp = tp;
>  	return error;
> @@ -1338,74 +1302,6 @@ out_bmap_cancel:
>  	goto out;
>  }
>  
> -int
> -xfs_itruncate_data(
> -	struct xfs_trans	**tpp,
> -	struct xfs_inode	*ip,
> -	xfs_fsize_t		new_size)
> -{
> -	int			error;
> -
> -	trace_xfs_itruncate_data_start(ip, new_size);
> -
> -	/*
> -	 * The first thing we do is set the size to new_size permanently on
> -	 * disk.  This way we don't have to worry about anyone ever being able
> -	 * to look at the data being freed even in the face of a crash.
> -	 * What we're getting around here is the case where we free a block, it
> -	 * is allocated to another file, it is written to, and then we crash.
> -	 * If the new data gets written to the file but the log buffers
> -	 * containing the free and reallocation don't, then we'd end up with
> -	 * garbage in the blocks being freed.  As long as we make the new_size
> -	 * permanent before actually freeing any blocks it doesn't matter if
> -	 * they get written to.
> -	 */
> -	if (ip->i_d.di_nextents > 0) {
> -		/*
> -		 * If we are not changing the file size then do not update
> -		 * the on-disk file size - we may be called from
> -		 * xfs_inactive_free_eofblocks().  If we update the on-disk
> -		 * file size and then the system crashes before the contents
> -		 * of the file are flushed to disk then the files may be
> -		 * full of holes (ie NULL files bug).
> -		 */
> -		if (ip->i_size != new_size) {
> -			ip->i_d.di_size = new_size;
> -			ip->i_size = new_size;
> -			xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
> -		}
> -	}
> -
> -	error = xfs_itruncate_extents(tpp, ip, XFS_DATA_FORK, new_size);
> -	if (error)
> -		return error;
> -
> -	/*
> -	 * If we are not changing the file size then do not update the on-disk
> -	 * file size - we may be called from xfs_inactive_free_eofblocks().
> -	 * If we update the on-disk file size and then the system crashes
> -	 * before the contents of the file are flushed to disk then the files
> -	 * may be full of holes (ie NULL files bug).
> -	 */
> -	xfs_isize_check(ip, new_size);
> -	if (ip->i_size != new_size) {
> -		ip->i_d.di_size = new_size;
> -		ip->i_size = new_size;
> -	}
> -
> -	ASSERT(new_size != 0 || ip->i_delayed_blks == 0);

You didn't pull this assert along with 

> -	ASSERT(new_size != 0 || ip->i_d.di_nextents == 0);

this one into xfs_qm_scall_trunc_qfile.  Was that intentional?

> -
> -	/*
> -	 * Always re-log the inode so that our permanent transaction can keep
> -	 * on rolling it forward in the log.
> -	 */
> -	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
> -
> -	trace_xfs_itruncate_data_end(ip, new_size);
> -	return 0;
> -}
> -
>  /*
>   * This is called when the inode's link count goes to 0.
>   * We place the on-disk inode on a list in the AGI.  It
> Index: xfs/fs/xfs/xfs_inode.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:58:08.843372251 +0100
> +++ xfs/fs/xfs/xfs_inode.h	2011-11-30 12:58:51.523141034 +0100
> @@ -491,8 +491,6 @@ int		xfs_ifree(struct xfs_trans *, xfs_i
>  			   struct xfs_bmap_free *);
>  int		xfs_itruncate_extents(struct xfs_trans **, struct xfs_inode *,
>  				      int, xfs_fsize_t);
> -int		xfs_itruncate_data(struct xfs_trans **, struct xfs_inode *,
> -				   xfs_fsize_t);
>  int		xfs_iunlink(struct xfs_trans *, xfs_inode_t *);
>  
>  void		xfs_iext_realloc(xfs_inode_t *, int, int);
> Index: xfs/fs/xfs/xfs_iops.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_iops.c	2011-11-30 12:58:07.856710929 +0100
> +++ xfs/fs/xfs/xfs_iops.c	2011-11-30 12:58:51.523141034 +0100
> @@ -750,6 +750,7 @@ xfs_setattr_size(
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct inode		*inode = VFS_I(ip);
>  	int			mask = iattr->ia_valid;
> +	xfs_off_t		oldsize, newsize;
>  	struct xfs_trans	*tp;
>  	int			error;
>  	uint			lock_flags;
> @@ -777,11 +778,13 @@ xfs_setattr_size(
>  		lock_flags |= XFS_IOLOCK_EXCL;
>  	xfs_ilock(ip, lock_flags);
>  
> +	oldsize = ip->i_size;
> +	newsize = iattr->ia_size;
> +
>  	/*
>  	 * Short circuit the truncate case for zero length files.
>  	 */
> -	if (iattr->ia_size == 0 &&
> -	    ip->i_size == 0 && ip->i_d.di_nextents == 0) {
> +	if (newsize == 0 && oldsize == 0 && ip->i_d.di_nextents == 0) {
>  		if (!(mask & (ATTR_CTIME|ATTR_MTIME)))
>  			goto out_unlock;
>  
> @@ -807,14 +810,14 @@ xfs_setattr_size(
>  	 * the inode to the transaction, because the inode cannot be unlocked
>  	 * once it is a part of the transaction.
>  	 */
> -	if (iattr->ia_size > ip->i_size) {
> +	if (newsize > oldsize) {
>  		/*
>  		 * Do the first part of growing a file: zero any data in the
>  		 * last block that is beyond the old EOF.  We need to do this
>  		 * before the inode is joined to the transaction to modify
>  		 * i_size.
>  		 */
> -		error = xfs_zero_eof(ip, iattr->ia_size, ip->i_size);
> +		error = xfs_zero_eof(ip, newsize, oldsize);
>  		if (error)
>  			goto out_unlock;
>  	}
> @@ -833,8 +836,8 @@ xfs_setattr_size(
>  	 * here and prevents waiting for other data not within the range we
>  	 * care about here.
>  	 */
> -	if (ip->i_size != ip->i_d.di_size && iattr->ia_size > ip->i_d.di_size) {
> -		error = xfs_flush_pages(ip, ip->i_d.di_size, iattr->ia_size, 0,
> +	if (oldsize != ip->i_d.di_size && newsize > ip->i_d.di_size) {
> +		error = xfs_flush_pages(ip, ip->i_d.di_size, newsize, 0,
>  					FI_NONE);
>  		if (error)
>  			goto out_unlock;
> @@ -845,8 +848,7 @@ xfs_setattr_size(
>  	 */
>  	inode_dio_wait(inode);
>  
> -	error = -block_truncate_page(inode->i_mapping, iattr->ia_size,
> -				     xfs_get_blocks);
> +	error = -block_truncate_page(inode->i_mapping, newsize, xfs_get_blocks);
>  	if (error)
>  		goto out_unlock;
>  
> @@ -857,7 +859,7 @@ xfs_setattr_size(
>  	if (error)
>  		goto out_trans_cancel;
>  
> -	truncate_setsize(inode, iattr->ia_size);
> +	truncate_setsize(inode, newsize);
>  
>  	commit_flags = XFS_TRANS_RELEASE_LOG_RES;
>  	lock_flags |= XFS_ILOCK_EXCL;
> @@ -876,19 +878,30 @@ xfs_setattr_size(
>  	 * these flags set.  For all other operations the VFS set these flags
>  	 * explicitly if it wants a timestamp update.
>  	 */
> -	if (iattr->ia_size != ip->i_size &&
> -	    (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
> +	if (newsize != oldsize && (!(mask & (ATTR_CTIME | ATTR_MTIME)))) {
>  		iattr->ia_ctime = iattr->ia_mtime =
>  			current_fs_time(inode->i_sb);
>  		mask |= ATTR_CTIME | ATTR_MTIME;
>  	}
>  
> -	if (iattr->ia_size > ip->i_size) {
> -		ip->i_d.di_size = iattr->ia_size;
> -		ip->i_size = iattr->ia_size;
> -	} else if (iattr->ia_size <= ip->i_size ||
> -		   (iattr->ia_size == 0 && ip->i_d.di_nextents)) {
> -		error = xfs_itruncate_data(&tp, ip, iattr->ia_size);
> +	/*
> +	 * The first thing we do is set the size to new_size permanently on
> +	 * disk.  This way we don't have to worry about anyone ever being able
> +	 * to look at the data being freed even in the face of a crash.
> +	 * What we're getting around here is the case where we free a block, it
> +	 * is allocated to another file, it is written to, and then we crash.
> +	 * If the new data gets written to the file but the log buffers
> +	 * containing the free and reallocation don't, then we'd end up with
> +	 * garbage in the blocks being freed.  As long as we make the new size
> +	 * permanent before actually freeing any blocks it doesn't matter if
> +	 * they get written to.
> +	 */
> +	ip->i_d.di_size = newsize;
> +	ip->i_size = newsize;
> +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +
> +	if (newsize <= oldsize) {
> +		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);
>  		if (error)
>  			goto out_trans_abort;

This newsize/oldsize section is really a nice cleanup.

>  
> Index: xfs/fs/xfs/xfs_qm_syscalls.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_qm_syscalls.c	2011-11-30 12:58:07.866710876 +0100
> +++ xfs/fs/xfs/xfs_qm_syscalls.c	2011-11-30 12:58:51.523141034 +0100
> @@ -31,6 +31,7 @@
>  #include "xfs_mount.h"
>  #include "xfs_bmap_btree.h"
>  #include "xfs_inode.h"
> +#include "xfs_inode_item.h"
>  #include "xfs_itable.h"
>  #include "xfs_bmap.h"
>  #include "xfs_rtalloc.h"
> @@ -263,13 +264,19 @@ xfs_qm_scall_trunc_qfile(
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, 0);
>  

> -	error = xfs_itruncate_data(&tp, ip, 0);
> +	ip->i_d.di_size = 0;
> +	ip->i_size = 0;
> +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +
> +	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);

It it worth a comment that you log the size here because of the
possibility of another file being allocated these extents in the face of
a crash?

>  	if (error) {
>  		xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
>  				     XFS_TRANS_ABORT);
>  		goto out_unlock;
>  	}
>  
> +	ASSERT(ip->i_d.di_nextents == 0);
> +
>  	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
>  	error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
>  
> Index: xfs/fs/xfs/xfs_trace.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_trace.h	2011-11-30 12:58:07.876710822 +0100
> +++ xfs/fs/xfs/xfs_trace.h	2011-11-30 12:58:51.523141034 +0100
> @@ -1090,8 +1090,8 @@ DECLARE_EVENT_CLASS(xfs_itrunc_class,
>  DEFINE_EVENT(xfs_itrunc_class, name, \
>  	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size), \
>  	TP_ARGS(ip, new_size))
> -DEFINE_ITRUNC_EVENT(xfs_itruncate_data_start);
> -DEFINE_ITRUNC_EVENT(xfs_itruncate_data_end);
> +DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_start);
> +DEFINE_ITRUNC_EVENT(xfs_itruncate_extents_end);
>  
>  TRACE_EVENT(xfs_pagecache_inval,
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t start, xfs_off_t finish),
> Index: xfs/fs/xfs/xfs_vnodeops.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-11-30 12:58:07.893377397 +0100
> +++ xfs/fs/xfs/xfs_vnodeops.c	2011-11-30 12:58:51.526474350 +0100
> @@ -226,7 +226,14 @@ xfs_free_eofblocks(
>  		xfs_ilock(ip, XFS_ILOCK_EXCL);
>  		xfs_trans_ijoin(tp, ip, 0);
>  
> -		error = xfs_itruncate_data(&tp, ip, ip->i_size);
> +		/*
> +		 * Do not update the on-disk file size.  If we update the
> +		 * on-disk file size and then the system crashes before the
> +		 * contents of the file are flushed to disk then the files
> +		 * may be full of holes (ie NULL files bug).
> +		 */
> +		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
> +					      ip->i_size);
>  		if (error) {
>  			/*
>  			 * If we get an error at this point we simply don't
> @@ -670,13 +677,19 @@ xfs_inactive(
>  		xfs_ilock(ip, XFS_ILOCK_EXCL);
>  		xfs_trans_ijoin(tp, ip, 0);
>  
> -		error = xfs_itruncate_data(&tp, ip, 0);
> +		ip->i_d.di_size = 0;
> +		ip->i_size = 0;
> +		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);

Maybe a comment here too?

> +
> +		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
>  		if (error) {
>  			xfs_trans_cancel(tp,
>  				XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
>  			xfs_iunlock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
>  			return VN_INACTIVE_CACHE;
>  		}
> +
> +		ASSERT(ip->i_d.di_nextents == 0);
>  	} else if (S_ISLNK(ip->i_d.di_mode)) {
>  
>  		/*
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/11] xfs: remove xfs_itruncate_data
  2012-01-03 21:53   ` Ben Myers
@ 2012-01-04  9:27     ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2012-01-04  9:27 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

> You've tossed xfs_isize_check in the round filer, but you didn't mention
> that in your commit message.  Isn't this still useful code?

It falls under the removal of a few asserts in the debug build.  I don't
think it's a useful check.

> > -	if (ip->i_size != new_size) {
> > -		ip->i_d.di_size = new_size;
> > -		ip->i_size = new_size;
> > -	}
> > -
> > -	ASSERT(new_size != 0 || ip->i_delayed_blks == 0);
> 
> You didn't pull this assert along with 
> 
> > -	ASSERT(new_size != 0 || ip->i_d.di_nextents == 0);
> 
> this one into xfs_qm_scall_trunc_qfile.  Was that intentional?


We never have delayed blocks on the quota files, so there's not much
of a point to have this assert.

> > -	error = xfs_itruncate_data(&tp, ip, 0);
> > +	ip->i_d.di_size = 0;
> > +	ip->i_size = 0;
> > +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > +
> > +	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
> 
> It it worth a comment that you log the size here because of the
> possibility of another file being allocated these extents in the face of
> a crash?

I don't think there's a point in duplicating that comment.  Eventually
I'll see if we can simply use the existing truncate code for the quota
files anyway.

> > -		error = xfs_itruncate_data(&tp, ip, 0);
> > +		ip->i_d.di_size = 0;
> > +		ip->i_size = 0;
> > +		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> 
> Maybe a comment here too?

We log inodes in tons of places without adding a comment ever time.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb
  2011-12-18 20:00 ` [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb Christoph Hellwig
@ 2012-01-04 20:32   ` Ben Myers
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-04 20:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:05PM -0500, Christoph Hellwig wrote:
> Replace the nasty if, else if, elseif condition with more natural C flow
> that expressed the logic we want here better.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> ---
>  fs/xfs/xfs_iomap.c |   36 ++++++++++++++++++------------------
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_iomap.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_iomap.c	2011-11-17 12:07:52.580802800 +0100
> +++ xfs/fs/xfs/xfs_iomap.c	2011-11-30 10:57:35.649224495 +0100
> @@ -57,26 +57,26 @@ xfs_iomap_eof_align_last_fsb(
>  	xfs_fileoff_t	*last_fsb)
>  {
>  	xfs_fileoff_t	new_last_fsb = 0;
> -	xfs_extlen_t	align;
> +	xfs_extlen_t	align = 0;
>  	int		eof, error;
>  
> -	if (XFS_IS_REALTIME_INODE(ip))
> -		;
> -	/*
> -	 * If mounted with the "-o swalloc" option, roundup the allocation
> -	 * request to a stripe width boundary if the file size is >=
> -	 * stripe width and we are allocating past the allocation eof.
> -	 */
> -	else if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC) &&
> -	        (ip->i_size >= XFS_FSB_TO_B(mp, mp->m_swidth)))
> -		new_last_fsb = roundup_64(*last_fsb, mp->m_swidth);
> -	/*
> -	 * Roundup the allocation request to a stripe unit (m_dalign) boundary
> -	 * if the file size is >= stripe unit size, and we are allocating past
> -	 * the allocation eof.
> -	 */
> -	else if (mp->m_dalign && (ip->i_size >= XFS_FSB_TO_B(mp, mp->m_dalign)))
> -		new_last_fsb = roundup_64(*last_fsb, mp->m_dalign);
> +	if (!XFS_IS_REALTIME_INODE(ip)) {
> +		/*
> +		 * Round up the allocation request to a stripe unit
> +		 * (m_dalign) boundary if the file size is >= stripe unit
> +		 * size, and we are allocating past the allocation eof.
> +		 *
> +		 * If mounted with the "-o swalloc" option the alignment is
> +		 * increased from the strip unit size to the stripe width.
> +		 */
> +		if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
> +			align = mp->m_swidth;
> +		else if (mp->m_dalign)
> +			align = mp->m_dalign;
> +
> +		if (align && ip->i_size >= XFS_FSB_TO_B(mp, align))
> +			new_last_fsb = roundup_64(*last_fsb, align);
> +	}

The behavior looks equivalent and this is much easier on the eyes.
Looks good.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/11] xfs: remove the unused dm_attrs structure
  2011-12-18 20:00 ` [PATCH 03/11] xfs: remove the unused dm_attrs structure Christoph Hellwig
@ 2012-01-04 21:13   ` Ben Myers
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-04 21:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:06PM -0500, Christoph Hellwig wrote:
> .. and the just as dead bhv_desc forward declaration while we're at it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Alex Elder <aelder@sgi.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Looks good.
Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork
  2011-12-18 20:00 ` [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork Christoph Hellwig
@ 2012-01-06 16:58   ` Ben Myers
  2012-01-16 22:45     ` Ben Myers
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Myers @ 2012-01-06 16:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:07PM -0500, Christoph Hellwig wrote:
> We spent a lot of effort to maintain this field, but it always equalts to the
								 equals the
> fork size divided by the constant size of an extent.  The prime use of it is
> to assert that the two stay in sync.  Just divide the fork size by the extent
> size in the few places that we actually use it and remove the overhead
> of maintaining it.  Also introduce a few helpers to consolidate the places
> where we actually care about the value.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

After reviewing this patch it's not crystal clear to me why we were
putting all that effort into keeping this counter uptodate on the inode
instead of using helpers like you've implemented.  Maybe a question of
integer division as Dave suggested.  This is a nice improvement.

> Index: xfs/fs/xfs/xfs_bmap.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_bmap.c	2011-12-12 10:33:55.748696870 -0800
> +++ xfs/fs/xfs/xfs_bmap.c	2011-12-14 05:15:20.612373687 -0800
> @@ -249,7 +249,27 @@ xfs_bmbt_lookup_ge(
>  }
>  
>  /*
> -* Update the record referred to by cur to the value given
> + * Check if the inode needs to be converted to btree format.
> + */
> +static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
> +{
> +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
> +		XFS_IFORK_NEXTENTS(ip, whichfork) >
> +			XFS_IFORK_MAXEXT(ip, whichfork);
> +}
> +
> +/*
> + * Check if the inode should be converted to extent format.
> + */
> +static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
> +{
> +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
> +		XFS_IFORK_NEXTENTS(ip, whichfork) <=
> +			XFS_IFORK_MAXEXT(ip, whichfork);
> +}

The logic in these two appears to be equivalent to the code you've
replaced in all but one case...

...

> @@ -5321,8 +5318,7 @@ xfs_bunmapi(
>  		 * will be dirty.
>  		 */
>  		if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
> -		    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
> -		    XFS_IFORK_NEXTENTS(ip, whichfork) >= ifp->if_ext_max &&
						      ^^ 
All other tests for this were:
XFS_IFORK_NEXTENTS(ip, whichfork) > ifp->if_ext_max

Did you just fix a lurking off-by-one or insert one?

xfs_bmap_needs_btree needs ip->i_d.di_nextents to have been incremented 
already in order to detect that we need to convert to btree format.  In
this case we haven't done that yet and are checking to see if doing so
would require conversion to btree format... 

Looks to me like we can't use xfs_bmap_needs_btree here and should use
the old logic.  Right?

> +		    xfs_bmap_needs_btree(ip, whichfork) &&
>  		    del.br_startoff > got.br_startoff &&
>  		    del.br_startoff + del.br_blockcount <
>  		    got.br_startoff + got.br_blockcount) {

...

> @@ -180,18 +182,25 @@ xfs_swap_extents_check_format(
>  	 * (a common defrag case) which will occur when the temp inode is in
>  	 * extent format...
>  	 */
> -	if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
> -	    ((XFS_IFORK_BOFF(ip) &&
> -	      tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip)) ||
> -	     XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <= ip->i_df.if_ext_max))
> -		return EINVAL;
> +	if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
> +		if (XFS_IFORK_BOFF(ip) &&
> +		    tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
> +			return EINVAL;
> +		if (XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) <=
> +		    XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK))
> +			return EINVAL;
> +	}
>  
>  	/* Reciprocal target->temp btree format checks */
> -	if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
> -	    ((XFS_IFORK_BOFF(tip) &&
> -	      ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip)) ||
> -	     XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <= tip->i_df.if_ext_max))
> -		return EINVAL;
> +	if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
> +		if (XFS_IFORK_BOFF(tip) &&
> +		    ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
> +			return EINVAL;
> +
> +		if (XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) <=
> +		    XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK))
> +			return EINVAL;
> +	}

Good readability improvement.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock
  2011-12-18 20:00 ` [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock Christoph Hellwig
@ 2012-01-13 21:49   ` Ben Myers
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-13 21:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:09PM -0500, Christoph Hellwig wrote:
> Index: xfs/fs/xfs/xfs_inode.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.c	2011-12-18 08:05:01.469974560 -0800
> +++ xfs/fs/xfs/xfs_inode.c	2011-12-18 08:06:23.266640737 -0800

...

> -/*
>   * In-core inode flags.
>   */
> -#define XFS_IRECLAIM		0x0001  /* started reclaiming this inode */
> -#define XFS_ISTALE		0x0002	/* inode has been staled */
> -#define XFS_IRECLAIMABLE	0x0004	/* inode can be reclaimed */
> -#define XFS_INEW		0x0008	/* inode has just been allocated */
> -#define XFS_IFILESTREAM		0x0010	/* inode is in a filestream directory */
> -#define XFS_ITRUNCATED		0x0020	/* truncated down so flush-on-close */
> -#define XFS_IDIRTY_RELEASE	0x0040	/* dirty release already seen */
> +#define XFS_IRECLAIM		(1 << 0) /* started reclaiming this inode */
> +#define XFS_ISTALE		(1 << 1) /* inode has been staled */
> +#define XFS_IRECLAIMABLE	(1 << 2) /* inode can be reclaimed */
> +#define XFS_INEW		(1 << 3) /* inode has just been allocated */
> +#define XFS_IFILESTREAM		(1 << 4) /* inode is in a filestream dir. */
> +#define XFS_ITRUNCATED		(1 << 5) /* truncated down so flush-on-close */
> +#define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
> +#define __XFS_IFLOCK_BIT	7	 /* inode is being flushed right now */
> +#define XFS_IFLOCK		(1 << __XFS_IFLOCK_BIT)

Nice.

> +static inline void xfs_iflock(struct xfs_inode *ip)
> +{
> +	if (!xfs_iflock_nowait(ip))
> +		__xfs_iflock(ip);
> +}

Going after the iflock nowait first seems a little silly, but I'm
guessing that you're trying to avoid touching the zone wait_table in
bit_waitqueue if you can avoid it, along with the rest of the overhead
related to seeting up the wait queue in __xfs_iflock.

> +
> +static inline void xfs_ifunlock(struct xfs_inode *ip)
> +{
> +	xfs_iflags_clear(ip, XFS_IFLOCK);
> +	wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
> +}

I was going to suggest this:

	spin_lock(&ip->i_flags_lock); 
	ip->i_flags &= ~XFS_IFLOCK;
	wake_up_bit(... 
	spin_unlock(&ip->i_flags_lock);

After some study I believe what you have is ok:

Say this is process A.  If process B got the lock after A cleared IFLOCK
and before A wake_up_bit wakes process C.  C will just go back to sleep
in __xfs_iflock until B calls xfs_ifunlock to wake C again.  

Looks good.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue
  2011-12-18 20:00 ` [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue Christoph Hellwig
@ 2012-01-13 22:42   ` Ben Myers
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-13 22:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:10PM -0500, Christoph Hellwig wrote:
> Replace i_pin_wait, which is only used during synchronous inode flushing
> with a bit waitqueue.  This trades off a much smaller inode against
> slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
> bytes in the XFS inode.
> 
> Reviewed-by: Alex Elder <aelder@sgi.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Ben Myers <bpm@sgi.com>

> 
> ---
>  fs/xfs/xfs_inode.c      |   27 +++++++++++++++++++++------
>  fs/xfs/xfs_inode.h      |    3 ++-
>  fs/xfs/xfs_inode_item.c |    2 +-
>  fs/xfs/xfs_super.c      |    1 -
>  4 files changed, 24 insertions(+), 9 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_inode.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.c	2011-11-30 12:59:08.843047205 +0100
> +++ xfs/fs/xfs/xfs_inode.c	2011-11-30 12:59:10.296372665 +0100
> @@ -2037,7 +2037,7 @@ xfs_idestroy_fork(
>   * once someone is waiting for it to be unpinned.
>   */
>  static void
> -xfs_iunpin_nowait(
> +xfs_iunpin(
>  	struct xfs_inode	*ip)
>  {
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
> @@ -2049,14 +2049,29 @@ xfs_iunpin_nowait(
>  
>  }
>  
> +static void
> +__xfs_iunpin_wait(
> +	struct xfs_inode	*ip)
> +{
> +	wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IPINNED_BIT);
> +	DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IPINNED_BIT);
> +
> +	xfs_iunpin(ip);
> +
> +	do {
> +		prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
> +		if (xfs_ipincount(ip))
> +			io_schedule();
> +	} while (xfs_ipincount(ip));
> +	finish_wait(wq, &wait.wait);
> +}
> +
>  void
>  xfs_iunpin_wait(
>  	struct xfs_inode	*ip)
>  {
> -	if (xfs_ipincount(ip)) {
> -		xfs_iunpin_nowait(ip);
> -		wait_event(ip->i_ipin_wait, (xfs_ipincount(ip) == 0));
> -	}
> +	if (xfs_ipincount(ip))
> +		__xfs_iunpin_wait(ip);
>  }
>  
>  /*
> @@ -2415,7 +2430,7 @@ xfs_iflush(
>  	 * out for us if they occur after the log force completes.
>  	 */
>  	if (!(flags & SYNC_WAIT) && xfs_ipincount(ip)) {
> -		xfs_iunpin_nowait(ip);
> +		xfs_iunpin(ip);
>  		xfs_ifunlock(ip);
>  		return EAGAIN;
>  	}
> Index: xfs/fs/xfs/xfs_inode.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:59:08.846380520 +0100
> +++ xfs/fs/xfs/xfs_inode.h	2011-11-30 12:59:10.296372665 +0100
> @@ -238,7 +238,6 @@ typedef struct xfs_inode {
>  	mrlock_t		i_lock;		/* inode lock */
>  	mrlock_t		i_iolock;	/* inode IO lock */
>  	atomic_t		i_pincount;	/* inode pin count */
> -	wait_queue_head_t	i_ipin_wait;	/* inode pinning wait queue */
>  	spinlock_t		i_flags_lock;	/* inode i_flags lock */
>  	/* Miscellaneous state. */
>  	unsigned long		i_flags;	/* see defined flags below */
> @@ -367,6 +366,8 @@ xfs_set_projid(struct xfs_inode *ip,
>  #define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
>  #define __XFS_IFLOCK_BIT	7	 /* inode is being flushed right now */
>  #define XFS_IFLOCK		(1 << __XFS_IFLOCK_BIT)
> +#define __XFS_IPINNED_BIT	8	 /* wakeup key for zero pin count */
> +#define XFS_IPINNED		(1 << __XFS_IPINNED_BIT)
>  
>  /*
>   * Per-lifetime flags need to be reset when re-using a reclaimable inode during
> Index: xfs/fs/xfs/xfs_inode_item.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode_item.c	2011-11-30 12:59:08.846380520 +0100
> +++ xfs/fs/xfs/xfs_inode_item.c	2011-11-30 12:59:10.296372665 +0100
> @@ -555,7 +555,7 @@ xfs_inode_item_unpin(
>  	trace_xfs_inode_unpin(ip, _RET_IP_);
>  	ASSERT(atomic_read(&ip->i_pincount) > 0);
>  	if (atomic_dec_and_test(&ip->i_pincount))
> -		wake_up(&ip->i_ipin_wait);
> +		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
>  }
>  
>  /*
> Index: xfs/fs/xfs/xfs_super.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_super.c	2011-11-30 12:59:08.846380520 +0100
> +++ xfs/fs/xfs/xfs_super.c	2011-11-30 12:59:10.296372665 +0100
> @@ -828,7 +828,6 @@ xfs_fs_inode_init_once(
>  	/* xfs inode */
>  	atomic_set(&ip->i_pincount, 0);
>  	spin_lock_init(&ip->i_flags_lock);
> -	init_waitqueue_head(&ip->i_ipin_wait);
>  
>  	mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
>  		     "xfsino", ip->i_ino);
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode
  2011-12-18 20:00 ` [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode Christoph Hellwig
@ 2012-01-16 18:32   ` Ben Myers
  2012-01-16 19:45     ` Ben Myers
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Myers @ 2012-01-16 18:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:11PM -0500, Christoph Hellwig wrote:
> There is no fundamental need to keep an in-memory inode size copy in the XFS
> inode.  We already have the on-disk value in the dinode, and the separate
> in-memory copy that we need for regular files only in the XFS inode.
> 
> Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
> VFS inode i_size field for regular fields.  Switch code that was directly
> accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
> we are limited to regular files direct access of the VFS inode i_size field.
> 
> This also allows dropping some fairly complicated code in the write path
> which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
> that is getting updated inside ->write_end.
> 
> Note that we do not bother resetting the VFS i_size when truncating a file
> that gets freed to zero as there is point in doing so because the VFS inode
> is no longer in use at this point.  Just relax the assert in xfs_ifree to
> only check the on-disk size instead.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This looks good to me too.  The only suggestion I had was that in some
of these places where we call XFS_ISIZE or i_size_read twice in a row,
it might be nicer to read them into a local variable and use that.
Dave's comments were very helpful.

Reviewed-by: Ben Myers <bpm@sgi.com>

> ---
>  fs/xfs/xfs_aops.c        |    2 +-
>  fs/xfs/xfs_bmap.c        |   15 ++++++---------
>  fs/xfs/xfs_file.c        |   45 +++++++++++----------------------------------
>  fs/xfs/xfs_fs_subr.c     |    2 +-
>  fs/xfs/xfs_iget.c        |    1 -
>  fs/xfs/xfs_inode.c       |    8 ++------
>  fs/xfs/xfs_inode.h       |   16 ++++++++++++----
>  fs/xfs/xfs_iomap.c       |   12 ++++++------
>  fs/xfs/xfs_iops.c        |    3 +--
>  fs/xfs/xfs_qm_syscalls.c |    1 -
>  fs/xfs/xfs_trace.h       |    2 +-
>  fs/xfs/xfs_vnodeops.c    |   31 +++++++++++++++----------------
>  12 files changed, 56 insertions(+), 82 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_aops.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_aops.c	2011-12-02 19:39:30.743827732 +0100
> +++ xfs/fs/xfs/xfs_aops.c	2011-12-07 11:18:04.305981907 +0100
> @@ -111,7 +111,7 @@ xfs_ioend_new_eof(
>  	xfs_fsize_t		bsize;
>  
>  	bsize = ioend->io_offset + ioend->io_size;
> -	isize = MAX(ip->i_size, ip->i_new_size);
> +	isize = MAX(i_size_read(VFS_I(ip)), ip->i_new_size);
>  	isize = MIN(isize, bsize);
>  	return isize > ip->i_d.di_size ? isize : 0;
>  }
> Index: xfs/fs/xfs/xfs_file.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_file.c	2011-12-02 19:39:30.753827732 +0100
> +++ xfs/fs/xfs/xfs_file.c	2011-12-07 11:18:04.305981907 +0100
> @@ -327,7 +327,7 @@ xfs_file_aio_read(
>  				mp->m_rtdev_targp : mp->m_ddev_targp;
>  		if ((iocb->ki_pos & target->bt_smask) ||
>  		    (size & target->bt_smask)) {
> -			if (iocb->ki_pos == ip->i_size)
> +			if (iocb->ki_pos == i_size_read(inode))
>  				return 0;
>  			return -XFS_ERROR(EINVAL);
>  		}
> @@ -412,30 +412,6 @@ xfs_file_splice_read(
>  	return ret;
>  }
>  
> -STATIC void
> -xfs_aio_write_isize_update(
> -	struct inode	*inode,
> -	loff_t		*ppos,
> -	ssize_t		bytes_written)
> -{
> -	struct xfs_inode	*ip = XFS_I(inode);
> -	xfs_fsize_t		isize = i_size_read(inode);
> -
> -	if (bytes_written > 0)
> -		XFS_STATS_ADD(xs_write_bytes, bytes_written);
> -
> -	if (unlikely(bytes_written < 0 && bytes_written != -EFAULT &&
> -					*ppos > isize))
> -		*ppos = isize;
> -
> -	if (*ppos > ip->i_size) {
> -		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> -		if (*ppos > ip->i_size)
> -			ip->i_size = *ppos;
> -		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
> -	}
> -}
> -
>  /*
>   * If this was a direct or synchronous I/O that failed (such as ENOSPC) then
>   * part of the I/O may have been written to disk before the error occurred.  In
> @@ -451,8 +427,8 @@ xfs_aio_write_newsize_update(
>  		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
>  		if (new_size == ip->i_new_size)
>  			ip->i_new_size = 0;
> -		if (ip->i_d.di_size > ip->i_size)
> -			ip->i_d.di_size = ip->i_size;
> +		if (ip->i_d.di_size > i_size_read(VFS_I(ip)))
> +			ip->i_d.di_size = i_size_read(VFS_I(ip));
>  		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
>  	}
>  }
> @@ -492,15 +468,16 @@ xfs_file_splice_write(
>  	new_size = *ppos + count;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	if (new_size > ip->i_size)
> +	if (new_size > i_size_read(inode))
>  		ip->i_new_size = new_size;
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  
>  	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
>  
>  	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
> +	if (ret > 0)
> +		XFS_STATS_ADD(xs_write_bytes, ret);
>  
> -	xfs_aio_write_isize_update(inode, ppos, ret);
>  	xfs_aio_write_newsize_update(ip, new_size);
>  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
>  	return ret;
> @@ -728,14 +705,14 @@ restart:
>  	 * values are still valid.
>  	 */
>  	if ((ip->i_new_size && *pos > ip->i_new_size) ||
> -	    (!ip->i_new_size && *pos > ip->i_size)) {
> +	    (!ip->i_new_size && *pos > i_size_read(inode))) {
>  		if (*iolock == XFS_IOLOCK_SHARED) {
>  			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
>  			*iolock = XFS_IOLOCK_EXCL;
>  			xfs_rw_ilock(ip, XFS_ILOCK_EXCL | *iolock);
>  			goto restart;
>  		}
> -		error = -xfs_zero_eof(ip, *pos, ip->i_size);
> +		error = -xfs_zero_eof(ip, *pos, i_size_read(inode));
>  	}
>  
>  	/*
> @@ -744,7 +721,7 @@ restart:
>  	 * ip->i_new_size if this IO ends beyond any other in-flight writes.
>  	 */
>  	new_size = *pos + *count;
> -	if (new_size > ip->i_size) {
> +	if (new_size > i_size_read(inode)) {
>  		if (new_size > ip->i_new_size)
>  			ip->i_new_size = new_size;
>  		*new_sizep = new_size;
> @@ -957,11 +934,11 @@ xfs_file_aio_write(
>  		ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
>  						ocount, &new_size, &iolock);
>  
> -	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
> -
>  	if (ret <= 0)
>  		goto out_unlock;
>  
> +	XFS_STATS_ADD(xs_write_bytes, ret);
> +
>  	/* Handle various SYNC-type writes */
>  	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
>  		loff_t end = pos + ret - 1;
> Index: xfs/fs/xfs/xfs_fs_subr.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_fs_subr.c	2011-12-02 19:39:30.763827732 +0100
> +++ xfs/fs/xfs/xfs_fs_subr.c	2011-12-07 11:18:04.305981907 +0100
> @@ -90,7 +90,7 @@ xfs_wait_on_pages(
>  
>  	if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
>  		return -filemap_fdatawait_range(mapping, first,
> -					last == -1 ? ip->i_size - 1 : last);
> +					last == -1 ? XFS_ISIZE(ip) - 1 : last);
>  	}
>  	return 0;
>  }
> Index: xfs/fs/xfs/xfs_iops.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_iops.c	2011-12-07 11:15:33.600131684 +0100
> +++ xfs/fs/xfs/xfs_iops.c	2011-12-07 11:18:04.309315222 +0100
> @@ -778,7 +778,7 @@ xfs_setattr_size(
>  		lock_flags |= XFS_IOLOCK_EXCL;
>  	xfs_ilock(ip, lock_flags);
>  
> -	oldsize = ip->i_size;
> +	oldsize = inode->i_size;
>  	newsize = iattr->ia_size;
>  
>  	/*
> @@ -897,7 +897,6 @@ xfs_setattr_size(
>  	 * they get written to.
>  	 */
>  	ip->i_d.di_size = newsize;
> -	ip->i_size = newsize;
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>  
>  	if (newsize <= oldsize) {
> Index: xfs/fs/xfs/xfs_trace.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_trace.h	2011-12-07 11:17:02.346317572 +0100
> +++ xfs/fs/xfs/xfs_trace.h	2011-12-07 11:18:04.309315222 +0100
> @@ -1038,7 +1038,7 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
>  	TP_fast_assign(
>  		__entry->dev = VFS_I(ip)->i_sb->s_dev;
>  		__entry->ino = ip->i_ino;
> -		__entry->isize = ip->i_size;
> +		__entry->isize = VFS_I(ip)->i_size;
>  		__entry->disize = ip->i_d.di_size;
>  		__entry->new_size = ip->i_new_size;
>  		__entry->offset = offset;
> Index: xfs/fs/xfs/xfs_bmap.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_bmap.c	2011-12-07 11:17:02.339650941 +0100
> +++ xfs/fs/xfs/xfs_bmap.c	2011-12-07 11:18:04.312648538 +0100
> @@ -3997,11 +3997,8 @@ xfs_bmap_one_block(
>  	xfs_bmbt_irec_t	s;		/* internal version of extent */
>  
>  #ifndef DEBUG
> -	if (whichfork == XFS_DATA_FORK) {
> -		return S_ISREG(ip->i_d.di_mode) ?
> -			(ip->i_size == ip->i_mount->m_sb.sb_blocksize) :
> -			(ip->i_d.di_size == ip->i_mount->m_sb.sb_blocksize);
> -	}
> +	if (whichfork == XFS_DATA_FORK)
> +		return XFS_ISIZE(ip) == ip->i_mount->m_sb.sb_blocksize;
>  #endif	/* !DEBUG */
>  	if (XFS_IFORK_NEXTENTS(ip, whichfork) != 1)
>  		return 0;
> @@ -4013,7 +4010,7 @@ xfs_bmap_one_block(
>  	xfs_bmbt_get_all(ep, &s);
>  	rval = s.br_startoff == 0 && s.br_blockcount == 1;
>  	if (rval && whichfork == XFS_DATA_FORK)
> -		ASSERT(ip->i_size == ip->i_mount->m_sb.sb_blocksize);
> +		ASSERT(XFS_ISIZE(ip) == ip->i_mount->m_sb.sb_blocksize);
>  	return rval;
>  }
>  
> @@ -5425,7 +5422,7 @@ xfs_getbmapx_fix_eof_hole(
>  	if (startblock == HOLESTARTBLOCK) {
>  		mp = ip->i_mount;
>  		out->bmv_block = -1;
> -		fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, ip->i_size));
> +		fixlen = XFS_FSB_TO_BB(mp, XFS_B_TO_FSB(mp, XFS_ISIZE(ip)));
>  		fixlen -= out->bmv_offset;
>  		if (prealloced && out->bmv_offset + out->bmv_length == end) {
>  			/* Came to hole at EOF. Trim it. */
> @@ -5513,7 +5510,7 @@ xfs_getbmap(
>  			fixlen = XFS_MAXIOFFSET(mp);
>  		} else {
>  			prealloced = 0;
> -			fixlen = ip->i_size;
> +			fixlen = XFS_ISIZE(ip);
>  		}
>  	}
>  
> @@ -5542,7 +5539,7 @@ xfs_getbmap(
>  
>  	xfs_ilock(ip, XFS_IOLOCK_SHARED);
>  	if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) {
> -		if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
> +		if (ip->i_delayed_blks || XFS_ISIZE(ip) > ip->i_d.di_size) {
>  			error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
>  			if (error)
>  				goto out_unlock_iolock;
> Index: xfs/fs/xfs/xfs_iget.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_iget.c	2011-12-07 11:17:37.032796326 +0100
> +++ xfs/fs/xfs/xfs_iget.c	2011-12-07 11:18:04.312648538 +0100
> @@ -94,7 +94,6 @@ xfs_inode_alloc(
>  	ip->i_update_core = 0;
>  	ip->i_delayed_blks = 0;
>  	memset(&ip->i_d, 0, sizeof(xfs_icdinode_t));
> -	ip->i_size = 0;
>  	ip->i_new_size = 0;
>  
>  	return ip;
> Index: xfs/fs/xfs/xfs_inode.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.c	2011-12-07 11:17:50.642722594 +0100
> +++ xfs/fs/xfs/xfs_inode.c	2011-12-07 11:18:04.315981854 +0100
> @@ -347,7 +347,6 @@ xfs_iformat(
>  			return XFS_ERROR(EFSCORRUPTED);
>  		}
>  		ip->i_d.di_size = 0;
> -		ip->i_size = 0;
>  		ip->i_df.if_u2.if_rdev = xfs_dinode_get_rdev(dip);
>  		break;
>  
> @@ -853,7 +852,6 @@ xfs_iread(
>  	}
>  
>  	ip->i_delayed_blks = 0;
> -	ip->i_size = ip->i_d.di_size;
>  
>  	/*
>  	 * Mark the buffer containing the inode as something to keep
> @@ -1043,7 +1041,6 @@ xfs_ialloc(
>  	}
>  
>  	ip->i_d.di_size = 0;
> -	ip->i_size = 0;
>  	ip->i_d.di_nextents = 0;
>  	ASSERT(ip->i_d.di_nblocks == 0);
>  
> @@ -1198,7 +1195,7 @@ xfs_itruncate_extents(
>  	int			done = 0;
>  
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL));
> -	ASSERT(new_size <= ip->i_size);
> +	ASSERT(new_size <= XFS_ISIZE(ip));
>  	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>  	ASSERT(ip->i_itemp != NULL);
>  	ASSERT(ip->i_itemp->ili_lock_flags == 0);
> @@ -1712,8 +1709,7 @@ xfs_ifree(
>  	ASSERT(ip->i_d.di_nlink == 0);
>  	ASSERT(ip->i_d.di_nextents == 0);
>  	ASSERT(ip->i_d.di_anextents == 0);
> -	ASSERT((ip->i_d.di_size == 0 && ip->i_size == 0) ||
> -	       (!S_ISREG(ip->i_d.di_mode)));
> +	ASSERT(ip->i_d.di_size == 0 || !S_ISREG(ip->i_d.di_mode));
>  	ASSERT(ip->i_d.di_nblocks == 0);
>  
>  	/*
> Index: xfs/fs/xfs/xfs_iomap.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_iomap.c	2011-12-07 11:15:53.693356164 +0100
> +++ xfs/fs/xfs/xfs_iomap.c	2011-12-07 11:18:04.315981854 +0100
> @@ -74,7 +74,7 @@ xfs_iomap_eof_align_last_fsb(
>  		else if (mp->m_dalign)
>  			align = mp->m_dalign;
>  
> -		if (align && ip->i_size >= XFS_FSB_TO_B(mp, align))
> +		if (align && XFS_ISIZE(ip) >= XFS_FSB_TO_B(mp, align))
>  			new_last_fsb = roundup_64(*last_fsb, align);
>  	}
>  
> @@ -154,7 +154,7 @@ xfs_iomap_write_direct(
>  
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
> -	if ((offset + count) > ip->i_size) {
> +	if ((offset + count) > XFS_ISIZE(ip)) {
>  		error = xfs_iomap_eof_align_last_fsb(mp, ip, extsz, &last_fsb);
>  		if (error)
>  			goto error_out;
> @@ -211,7 +211,7 @@ xfs_iomap_write_direct(
>  	xfs_trans_ijoin(tp, ip, 0);
>  
>  	bmapi_flag = 0;
> -	if (offset < ip->i_size || extsz)
> +	if (offset < XFS_ISIZE(ip) || extsz)
>  		bmapi_flag |= XFS_BMAPI_PREALLOC;
>  
>  	/*
> @@ -286,7 +286,7 @@ xfs_iomap_eof_want_preallocate(
>  	int		found_delalloc = 0;
>  
>  	*prealloc = 0;
> -	if ((offset + count) <= ip->i_size)
> +	if (offset + count <= XFS_ISIZE(ip))
>  		return 0;
>  
>  	/*
> @@ -340,7 +340,7 @@ xfs_iomap_prealloc_size(
>  		 * if we pass in alloc_blocks = 0. Hence the "+ 1" to
>  		 * ensure we always pass in a non-zero value.
>  		 */
> -		alloc_blocks = XFS_B_TO_FSB(mp, ip->i_size) + 1;
> +		alloc_blocks = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)) + 1;
>  		alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
>  					rounddown_pow_of_two(alloc_blocks));
>  
> @@ -564,7 +564,7 @@ xfs_iomap_write_allocate(
>  			 * back....
>  			 */
>  			nimaps = 1;
> -			end_fsb = XFS_B_TO_FSB(mp, ip->i_size);
> +			end_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
>  			error = xfs_bmap_last_offset(NULL, ip, &last_block,
>  							XFS_DATA_FORK);
>  			if (error)
> Index: xfs/fs/xfs/xfs_vnodeops.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_vnodeops.c	2011-12-07 11:15:33.603464999 +0100
> +++ xfs/fs/xfs/xfs_vnodeops.c	2011-12-07 11:18:04.315981854 +0100
> @@ -175,7 +175,7 @@ xfs_free_eofblocks(
>  	 * Figure out if there are any blocks beyond the end
>  	 * of the file.  If not, then there is nothing to do.
>  	 */
> -	end_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)ip->i_size));
> +	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
>  	last_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_MAXIOFFSET(mp));
>  	if (last_fsb <= end_fsb)
>  		return 0;
> @@ -233,7 +233,7 @@ xfs_free_eofblocks(
>  		 * may be full of holes (ie NULL files bug).
>  		 */
>  		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK,
> -					      ip->i_size);
> +					      XFS_ISIZE(ip));
>  		if (error) {
>  			/*
>  			 * If we get an error at this point we simply don't
> @@ -547,8 +547,8 @@ xfs_release(
>  		return 0;
>  
>  	if ((S_ISREG(ip->i_d.di_mode) &&
> -	     ((ip->i_size > 0) || (VN_CACHED(VFS_I(ip)) > 0 ||
> -	       ip->i_delayed_blks > 0)) &&
> +	     (VFS_I(ip)->i_size > 0 ||
> +	      (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) &&
>  	     (ip->i_df.if_flags & XFS_IFEXTENTS))  &&
>  	    (!(ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))) {
>  
> @@ -625,7 +625,7 @@ xfs_inactive(
>  	 * only one with a reference to the inode.
>  	 */
>  	truncate = ((ip->i_d.di_nlink == 0) &&
> -	    ((ip->i_d.di_size != 0) || (ip->i_size != 0) ||
> +	    ((ip->i_d.di_size != 0) || XFS_ISIZE(ip) != 0 ||
>  	     (ip->i_d.di_nextents > 0) || (ip->i_delayed_blks > 0)) &&
>  	    S_ISREG(ip->i_d.di_mode));
>  
> @@ -639,12 +639,12 @@ xfs_inactive(
>  
>  	if (ip->i_d.di_nlink != 0) {
>  		if ((S_ISREG(ip->i_d.di_mode) &&
> -                     ((ip->i_size > 0) || (VN_CACHED(VFS_I(ip)) > 0 ||
> -                       ip->i_delayed_blks > 0)) &&
> -		      (ip->i_df.if_flags & XFS_IFEXTENTS) &&
> -		     (!(ip->i_d.di_flags &
> +		    (VFS_I(ip)->i_size > 0 ||
> +		     (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) &&
> +		    (ip->i_df.if_flags & XFS_IFEXTENTS) &&
> +		    (!(ip->i_d.di_flags &
>  				(XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) ||
> -		      (ip->i_delayed_blks != 0)))) {
> +		     ip->i_delayed_blks != 0))) {
>  			error = xfs_free_eofblocks(mp, ip, 0);
>  			if (error)
>  				return VN_INACTIVE_CACHE;
> @@ -678,7 +678,6 @@ xfs_inactive(
>  		xfs_trans_ijoin(tp, ip, 0);
>  
>  		ip->i_d.di_size = 0;
> -		ip->i_size = 0;
>  		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>  
>  		error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
> @@ -1974,11 +1973,11 @@ xfs_zero_remaining_bytes(
>  	 * since nothing can read beyond eof.  The space will
>  	 * be zeroed when the file is extended anyway.
>  	 */
> -	if (startoff >= ip->i_size)
> +	if (startoff >= XFS_ISIZE(ip))
>  		return 0;
>  
> -	if (endoff > ip->i_size)
> -		endoff = ip->i_size;
> +	if (endoff > XFS_ISIZE(ip))
> +		endoff = XFS_ISIZE(ip);
>  
>  	bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
>  					mp->m_rtdev_targp : mp->m_ddev_targp,
> @@ -2273,7 +2272,7 @@ xfs_change_file_space(
>  		bf->l_start += offset;
>  		break;
>  	case 2: /*SEEK_END*/
> -		bf->l_start += ip->i_size;
> +		bf->l_start += XFS_ISIZE(ip);
>  		break;
>  	default:
>  		return XFS_ERROR(EINVAL);
> @@ -2290,7 +2289,7 @@ xfs_change_file_space(
>  	bf->l_whence = 0;
>  
>  	startoffset = bf->l_start;
> -	fsize = ip->i_size;
> +	fsize = XFS_ISIZE(ip);
>  
>  	/*
>  	 * XFS_IOC_RESVSP and XFS_IOC_UNRESVSP will reserve or unreserve
> Index: xfs/fs/xfs/xfs_inode.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.h	2011-12-07 11:17:50.642722594 +0100
> +++ xfs/fs/xfs/xfs_inode.h	2011-12-07 11:18:04.319315170 +0100
> @@ -246,16 +246,12 @@ typedef struct xfs_inode {
>  
>  	xfs_icdinode_t		i_d;		/* most of ondisk inode */
>  
> -	xfs_fsize_t		i_size;		/* in-memory size */
>  	xfs_fsize_t		i_new_size;	/* size when write completes */
>  
>  	/* VFS inode */
>  	struct inode		i_vnode;	/* embedded VFS inode */
>  } xfs_inode_t;
>  
> -#define XFS_ISIZE(ip)	S_ISREG((ip)->i_d.di_mode) ? \
> -				(ip)->i_size : (ip)->i_d.di_size;
> -
>  /* Convert from vfs inode to xfs inode */
>  static inline struct xfs_inode *XFS_I(struct inode *inode)
>  {
> @@ -269,6 +265,18 @@ static inline struct inode *VFS_I(struct
>  }
>  
>  /*
> + * For regular files we only update the on-disk filesize when actually
> + * writing data back to disk.  Until then only the copy in the VFS inode
> + * is uptodate.
> + */
> +static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
> +{
> +	if (S_ISREG(ip->i_d.di_mode))
> +		return i_size_read(VFS_I(ip));
> +	return ip->i_d.di_size;
> +}
> +
> +/*
>   * i_flags helper functions
>   */
>  static inline void
> Index: xfs/fs/xfs/xfs_qm_syscalls.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_qm_syscalls.c	2011-12-07 11:15:33.600131684 +0100
> +++ xfs/fs/xfs/xfs_qm_syscalls.c	2011-12-07 11:18:04.319315170 +0100
> @@ -265,7 +265,6 @@ xfs_qm_scall_trunc_qfile(
>  	xfs_trans_ijoin(tp, ip, 0);
>  
>  	ip->i_d.di_size = 0;
> -	ip->i_size = 0;
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>  
>  	error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, 0);
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode
  2012-01-16 18:32   ` Ben Myers
@ 2012-01-16 19:45     ` Ben Myers
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-16 19:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Jan 16, 2012 at 12:32:01PM -0600, Ben Myers wrote:
> On Sun, Dec 18, 2011 at 03:00:11PM -0500, Christoph Hellwig wrote:
> > There is no fundamental need to keep an in-memory inode size copy in the XFS
> > inode.  We already have the on-disk value in the dinode, and the separate
> > in-memory copy that we need for regular files only in the XFS inode.
> > 
> > Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
> > VFS inode i_size field for regular fields.  Switch code that was directly
				       files.

I'll fix that up.

-Ben

> > accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
> > we are limited to regular files direct access of the VFS inode i_size field.
> > 
> > This also allows dropping some fairly complicated code in the write path
> > which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
> > that is getting updated inside ->write_end.
> > 
> > Note that we do not bother resetting the VFS i_size when truncating a file
> > that gets freed to zero as there is point in doing so because the VFS inode
> > is no longer in use at this point.  Just relax the assert in xfs_ifree to
> > only check the on-disk size instead.
> > 
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> This looks good to me too.  The only suggestion I had was that in some
> of these places where we call XFS_ISIZE or i_size_read twice in a row,
> it might be nicer to read them into a local variable and use that.
> Dave's comments were very helpful.
> 
> Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/11] xfs: remove the i_new_size field in struct xfs_inode
  2011-12-18 20:00 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
  2011-12-18 22:13   ` Dave Chinner
@ 2012-01-16 22:41   ` Ben Myers
  2012-01-17 20:14   ` Ben Myers
  2 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-16 22:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:12PM -0500, Christoph Hellwig wrote:
> Now that we use the VFS i_size field throughout XFS there is no need for the
> i_new_size field any more given that the VFS i_size field gets updated
> in ->write_end before unlocking the page, and thus is a) always uptodate when
> writeback could see a page.  Removing i_new_size also has the advantage that
> we will never have to trim back di_size during a failed buffered write,
> given that it never gets updated past i_size.
> 
> Note that currently the generic direct I/O code only updates i_size after
> calling our end_io handler, which requires a small workaround to make
> sure di_size actually makes it to disk.  I hope to fix this properly in
> the generic code.
> 
> A downside is that we lose the support for parallel non-overlapping O_DIRECT
> appending writes that recently was added.  I don't think keeping the complex
> and fragile i_new_size infrastructure for this is a good tradeoff - if we
> really care about parallel appending writers we should investigate turning
> the iolock into a range lock, which would also allow for parallel
> non-overlapping buffered writers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> ---
>  fs/xfs/xfs_aops.c  |   28 +++++++++++---------
>  fs/xfs/xfs_file.c  |   72 +++++++----------------------------------------------
>  fs/xfs/xfs_iget.c  |    1 
>  fs/xfs/xfs_inode.h |    2 -
>  fs/xfs/xfs_trace.h |   18 ++-----------
>  5 files changed, 29 insertions(+), 92 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_file.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_file.c	2011-11-30 12:59:11.669698558 +0100
> +++ xfs/fs/xfs/xfs_file.c	2011-11-30 12:59:13.533021797 +0100
> @@ -413,27 +413,6 @@ xfs_file_splice_read(
>  }
>  
>  /*
> - * If this was a direct or synchronous I/O that failed (such as ENOSPC) then
> - * part of the I/O may have been written to disk before the error occurred.  In
> - * this case the on-disk file size may have been adjusted beyond the in-memory
> - * file size and now needs to be truncated back.
> - */
> -STATIC void
> -xfs_aio_write_newsize_update(
> -	struct xfs_inode	*ip,
> -	xfs_fsize_t		new_size)
> -{
> -	if (new_size == ip->i_new_size) {

Ouch.  If I'm reading this right the behavior prior to this patch is a
little messed up...

xfs_file_aio_write
  new_size = 0
  xfs_file_buffered_aio_write(&new_size
    xfs_file_aio_write_checks - for a non-extending write it won't touch
    				*new_sizep
  generic_file_buffered_write - ...
  xfs_aio_write_isize_update - doesn't touch new_size
  xfs_aio_write_newsize_update:

STATIC void     
xfs_aio_write_newsize_update(
        struct xfs_inode        *ip, 
        xfs_fsize_t             new_size)
{                                               
        if (new_size == ip->i_new_size) {	 <--- 0 == 0 
                xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
                if (new_size == ip->i_new_size) 
                        ip->i_new_size = 0;
                if (ip->i_d.di_size > ip->i_size)
                        ip->i_d.di_size = ip->i_size;
                xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
        }       
}

AFAICT even for non-extending writes we are taking the ilock exclusive
to test (ip->i_d.di_size > ip->i_size).  That does not seem necessary,
correct?

This is not an issue with your patch... I just want to make sure I
understand. 

Thanks,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork
  2012-01-06 16:58   ` Ben Myers
@ 2012-01-16 22:45     ` Ben Myers
  2012-01-17 15:16       ` Ben Myers
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Myers @ 2012-01-16 22:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Hey Christoph,

On Fri, Jan 06, 2012 at 10:58:18AM -0600, Ben Myers wrote:
> On Sun, Dec 18, 2011 at 03:00:07PM -0500, Christoph Hellwig wrote:
> > We spent a lot of effort to maintain this field, but it always equalts to the
> 								 equals the
> > fork size divided by the constant size of an extent.  The prime use of it is
> > to assert that the two stay in sync.  Just divide the fork size by the extent
> > size in the few places that we actually use it and remove the overhead
> > of maintaining it.  Also introduce a few helpers to consolidate the places
> > where we actually care about the value.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> After reviewing this patch it's not crystal clear to me why we were
> putting all that effort into keeping this counter uptodate on the inode
> instead of using helpers like you've implemented.  Maybe a question of
> integer division as Dave suggested.  This is a nice improvement.
> 
> > Index: xfs/fs/xfs/xfs_bmap.c
> > ===================================================================
> > --- xfs.orig/fs/xfs/xfs_bmap.c	2011-12-12 10:33:55.748696870 -0800
> > +++ xfs/fs/xfs/xfs_bmap.c	2011-12-14 05:15:20.612373687 -0800
> > @@ -249,7 +249,27 @@ xfs_bmbt_lookup_ge(
> >  }
> >  
> >  /*
> > -* Update the record referred to by cur to the value given
> > + * Check if the inode needs to be converted to btree format.
> > + */
> > +static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
> > +{
> > +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
> > +		XFS_IFORK_NEXTENTS(ip, whichfork) >
> > +			XFS_IFORK_MAXEXT(ip, whichfork);
> > +}
> > +
> > +/*
> > + * Check if the inode should be converted to extent format.
> > + */
> > +static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
> > +{
> > +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
> > +		XFS_IFORK_NEXTENTS(ip, whichfork) <=
> > +			XFS_IFORK_MAXEXT(ip, whichfork);
> > +}
> 
> The logic in these two appears to be equivalent to the code you've
> replaced in all but one case...
> 
> ...
> 
> > @@ -5321,8 +5318,7 @@ xfs_bunmapi(
> >  		 * will be dirty.
> >  		 */
> >  		if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
> > -		    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
> > -		    XFS_IFORK_NEXTENTS(ip, whichfork) >= ifp->if_ext_max &&
> 						      ^^ 
> All other tests for this were:
> XFS_IFORK_NEXTENTS(ip, whichfork) > ifp->if_ext_max
> 
> Did you just fix a lurking off-by-one or insert one?
> 
> xfs_bmap_needs_btree needs ip->i_d.di_nextents to have been incremented 
> already in order to detect that we need to convert to btree format.  In
> this case we haven't done that yet and are checking to see if doing so
> would require conversion to btree format... 
> 
> Looks to me like we can't use xfs_bmap_needs_btree here and should use
> the old logic.  Right?

HCH, I have a question for you here that I feel needs to be resolved.
Can you take a look?

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork
  2012-01-16 22:45     ` Ben Myers
@ 2012-01-17 15:16       ` Ben Myers
  2012-01-17 17:04         ` Mark Tinguely
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Myers @ 2012-01-17 15:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Jan 16, 2012 at 04:45:27PM -0600, Ben Myers wrote:
> Hey Christoph,
> 
> On Fri, Jan 06, 2012 at 10:58:18AM -0600, Ben Myers wrote:
> > On Sun, Dec 18, 2011 at 03:00:07PM -0500, Christoph Hellwig wrote:
> > > We spent a lot of effort to maintain this field, but it always equalts to the
> > 								 equals the
> > > fork size divided by the constant size of an extent.  The prime use of it is
> > > to assert that the two stay in sync.  Just divide the fork size by the extent
> > > size in the few places that we actually use it and remove the overhead
> > > of maintaining it.  Also introduce a few helpers to consolidate the places
> > > where we actually care about the value.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > After reviewing this patch it's not crystal clear to me why we were
> > putting all that effort into keeping this counter uptodate on the inode
> > instead of using helpers like you've implemented.  Maybe a question of
> > integer division as Dave suggested.  This is a nice improvement.
> > 
> > > Index: xfs/fs/xfs/xfs_bmap.c
> > > ===================================================================
> > > --- xfs.orig/fs/xfs/xfs_bmap.c	2011-12-12 10:33:55.748696870 -0800
> > > +++ xfs/fs/xfs/xfs_bmap.c	2011-12-14 05:15:20.612373687 -0800
> > > @@ -249,7 +249,27 @@ xfs_bmbt_lookup_ge(
> > >  }
> > >  
> > >  /*
> > > -* Update the record referred to by cur to the value given
> > > + * Check if the inode needs to be converted to btree format.
> > > + */
> > > +static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
> > > +{
> > > +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
> > > +		XFS_IFORK_NEXTENTS(ip, whichfork) >
> > > +			XFS_IFORK_MAXEXT(ip, whichfork);
> > > +}
> > > +
> > > +/*
> > > + * Check if the inode should be converted to extent format.
> > > + */
> > > +static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
> > > +{
> > > +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
> > > +		XFS_IFORK_NEXTENTS(ip, whichfork) <=
> > > +			XFS_IFORK_MAXEXT(ip, whichfork);
> > > +}
> > 
> > The logic in these two appears to be equivalent to the code you've
> > replaced in all but one case...
> > 
> > ...
> > 
> > > @@ -5321,8 +5318,7 @@ xfs_bunmapi(
> > >  		 * will be dirty.
> > >  		 */
> > >  		if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
> > > -		    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
> > > -		    XFS_IFORK_NEXTENTS(ip, whichfork) >= ifp->if_ext_max &&
> > 						      ^^ 
> > All other tests for this were:
> > XFS_IFORK_NEXTENTS(ip, whichfork) > ifp->if_ext_max
> > 
> > Did you just fix a lurking off-by-one or insert one?
> > 
> > xfs_bmap_needs_btree needs ip->i_d.di_nextents to have been incremented 
> > already in order to detect that we need to convert to btree format.  In
> > this case we haven't done that yet and are checking to see if doing so
> > would require conversion to btree format... 
> > 
> > Looks to me like we can't use xfs_bmap_needs_btree here and should use
> > the old logic.  Right?
> 
> HCH, I have a question for you here that I feel needs to be resolved.
> Can you take a look?

Here is what I propose to use here:

@@ -5322,7 +5319,8 @@ xfs_bunmapi(
                 */
                if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
                    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
-                   XFS_IFORK_NEXTENTS(ip, whichfork) >= ifp->if_ext_max &&
+                   XFS_IFORK_NEXTENTS(ip, whichfork) >= /* Note the >= */
+                       XFS_IFORK_MAXEXT(ip, whichfork) &&
                    del.br_startoff > got.br_startoff &&
                    del.br_startoff + del.br_blockcount <
                    got.br_startoff + got.br_blockcount) {

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork
  2012-01-17 15:16       ` Ben Myers
@ 2012-01-17 17:04         ` Mark Tinguely
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Tinguely @ 2012-01-17 17:04 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

On 01/17/12 09:16, Ben Myers wrote:
> On Mon, Jan 16, 2012 at 04:45:27PM -0600, Ben Myers wrote:
>> Hey Christoph,
>>
>> On Fri, Jan 06, 2012 at 10:58:18AM -0600, Ben Myers wrote:
>>> On Sun, Dec 18, 2011 at 03:00:07PM -0500, Christoph Hellwig wrote:
>>>> We spent a lot of effort to maintain this field, but it always equalts to the
>>> 								equals the
>>>> fork size divided by the constant size of an extent.  The prime use of it is
>>>> to assert that the two stay in sync.  Just divide the fork size by the extent
>>>> size in the few places that we actually use it and remove the overhead
>>>> of maintaining it.  Also introduce a few helpers to consolidate the places
>>>> where we actually care about the value.
>>>>
>>>> Signed-off-by: Christoph Hellwig<hch@lst.de>
>>>> Reviewed-by: Dave Chinner<dchinner@redhat.com>
>>>
>>> After reviewing this patch it's not crystal clear to me why we were
>>> putting all that effort into keeping this counter uptodate on the inode
>>> instead of using helpers like you've implemented.  Maybe a question of
>>> integer division as Dave suggested.  This is a nice improvement.
>>>
>>>> Index: xfs/fs/xfs/xfs_bmap.c
>>>> ===================================================================
>>>> --- xfs.orig/fs/xfs/xfs_bmap.c	2011-12-12 10:33:55.748696870 -0800
>>>> +++ xfs/fs/xfs/xfs_bmap.c	2011-12-14 05:15:20.612373687 -0800
>>>> @@ -249,7 +249,27 @@ xfs_bmbt_lookup_ge(
>>>>   }
>>>>
>>>>   /*
>>>> -* Update the record referred to by cur to the value given
>>>> + * Check if the inode needs to be converted to btree format.
>>>> + */
>>>> +static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
>>>> +{
>>>> +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS&&
>>>> +		XFS_IFORK_NEXTENTS(ip, whichfork)>
>>>> +			XFS_IFORK_MAXEXT(ip, whichfork);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Check if the inode should be converted to extent format.
>>>> + */
>>>> +static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
>>>> +{
>>>> +	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE&&
>>>> +		XFS_IFORK_NEXTENTS(ip, whichfork)<=
>>>> +			XFS_IFORK_MAXEXT(ip, whichfork);
>>>> +}
>>>
>>> The logic in these two appears to be equivalent to the code you've
>>> replaced in all but one case...
>>>


I am coming late into this review party, and I know this is time 
sensitive. I am looking at this from the big picture and you can ignore me.

Looking at the INTENTION of the tests, IMO, we are asking: "is it time 
to change format?" IMO, you do not want to flip between format without 
data count change - in other words, the two tests should NOT overlap.

/*
  * Check if the inode should be converted to extent format.
  */
static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int 
whichfork)
{
	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
-		XFS_IFORK_NEXTENTS(ip, whichfork) <=
+		XFS_IFORK_NEXTENTS(ip, whichfork) <  /* less */
			XFS_IFORK_MAXEXT(ip, whichfork);
}


>>> ...
>>>
>>>> @@ -5321,8 +5318,7 @@ xfs_bunmapi(
>>>>   		 * will be dirty.
>>>>   		 */
>>>>   		if (!wasdel&&  xfs_trans_get_block_res(tp) == 0&&
>>>> -		    XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS&&
>>>> -		    XFS_IFORK_NEXTENTS(ip, whichfork)>= ifp->if_ext_max&&
>>> 						^^
>>> All other tests for this were:
>>> XFS_IFORK_NEXTENTS(ip, whichfork)>  ifp->if_ext_max
>>>
>>> Did you just fix a lurking off-by-one or insert one?
>>>
>>> xfs_bmap_needs_btree needs ip->i_d.di_nextents to have been incremented
>>> already in order to detect that we need to convert to btree format.  In
>>> this case we haven't done that yet and are checking to see if doing so
>>> would require conversion to btree format...
>>>
>>> Looks to me like we can't use xfs_bmap_needs_btree here and should use
>>> the old logic.  Right?
>>
>> HCH, I have a question for you here that I feel needs to be resolved.
>> Can you take a look?
>
> Here is what I propose to use here:
>
> @@ -5322,7 +5319,8 @@ xfs_bunmapi(
>                   */
>                  if (!wasdel&&  xfs_trans_get_block_res(tp) == 0&&
>                      XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS&&
> -                   XFS_IFORK_NEXTENTS(ip, whichfork)>= ifp->if_ext_max&&
> +                   XFS_IFORK_NEXTENTS(ip, whichfork)>= /* Note the>= */
> +                       XFS_IFORK_MAXEXT(ip, whichfork)&&
>                      del.br_startoff>  got.br_startoff&&
>                      del.br_startoff + del.br_blockcount<
>                      got.br_startoff + got.br_blockcount) {
>
> -Ben


The original "XFS_IFORK_NEXTENTS(ip, whichfork)>= ifp->if_ext_max" is 
important because the removal of the blocks in xfs_bmap_del_extent() 
will create a hole that requires an insertion.


--Mark Tinguely
   tinguely@sgi.com


--Mark Tinguely.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/11] xfs: remove the i_new_size field in struct xfs_inode
  2011-12-18 20:00 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
  2011-12-18 22:13   ` Dave Chinner
  2012-01-16 22:41   ` Ben Myers
@ 2012-01-17 20:14   ` Ben Myers
  2 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-17 20:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Hey,

On Sun, Dec 18, 2011 at 03:00:12PM -0500, Christoph Hellwig wrote:
> Now that we use the VFS i_size field throughout XFS there is no need for the
> i_new_size field any more given that the VFS i_size field gets updated
> in ->write_end before unlocking the page, and thus is a) always uptodate when
							^^ there's no b), so
							i've removed this.
> writeback could see a page.

> Removing i_new_size also has the advantage that
> we will never have to trim back di_size during a failed buffered write,
> given that it never gets updated past i_size.

Not trimming di_size back is very nice.  That was ugly.

> Note that currently the generic direct I/O code only updates i_size after
> calling our end_io handler, which requires a small workaround to make
> sure di_size actually makes it to disk.  I hope to fix this properly in
> the generic code.

Yeah, it looks like not doing this workaround might cause xfs not to
update the di_size properly.  And if it's ok for
generic_file_direct_write to update isize after calling ->direct_IO, it
looks like it should be for it to be done out of dio_complete.  This
workaround looks fine.

> A downside is that we lose the support for parallel non-overlapping O_DIRECT
> appending writes that recently was added.  I don't think keeping the complex
> and fragile i_new_size infrastructure for this is a good tradeoff - if we
> really care about parallel appending writers we should investigate turning
> the iolock into a range lock, which would also allow for parallel
> non-overlapping buffered writers.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I asked around and spent a little time trying to track down where
i_new_size came from.  It had been in xfs for some time even before the
initial import of from cvs into the history.git tree on kernel.org.
Before that it was merged into xfs as part of the iocore with some other
CXFS infrastructure.  From there the trail went cold.  So it looks like
it's historical and even today doesn't appear to have value for XFS.

Looks good.
Reviewed-by: Ben Myers <bpm@sgi.com>

> ---
>  fs/xfs/xfs_aops.c  |   28 +++++++++++---------
>  fs/xfs/xfs_file.c  |   72 +++++++----------------------------------------------
>  fs/xfs/xfs_iget.c  |    1 
>  fs/xfs/xfs_inode.h |    2 -
>  fs/xfs/xfs_trace.h |   18 ++-----------
>  5 files changed, 29 insertions(+), 92 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_file.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_file.c	2011-11-30 12:59:11.669698558 +0100
> +++ xfs/fs/xfs/xfs_file.c	2011-11-30 12:59:13.533021797 +0100
> @@ -413,27 +413,6 @@ xfs_file_splice_read(
>  }
>  
>  /*
> - * If this was a direct or synchronous I/O that failed (such as ENOSPC) then
> - * part of the I/O may have been written to disk before the error occurred.  In
> - * this case the on-disk file size may have been adjusted beyond the in-memory
> - * file size and now needs to be truncated back.
> - */
> -STATIC void
> -xfs_aio_write_newsize_update(
> -	struct xfs_inode	*ip,
> -	xfs_fsize_t		new_size)
> -{
> -	if (new_size == ip->i_new_size) {
> -		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> -		if (new_size == ip->i_new_size)
> -			ip->i_new_size = 0;
> -		if (ip->i_d.di_size > i_size_read(VFS_I(ip)))
> -			ip->i_d.di_size = i_size_read(VFS_I(ip));
> -		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
> -	}
> -}
> -
> -/*
>   * xfs_file_splice_write() does not use xfs_rw_ilock() because
>   * generic_file_splice_write() takes the i_mutex itself. This, in theory,
>   * couuld cause lock inversions between the aio_write path and the splice path
> @@ -451,7 +430,6 @@ xfs_file_splice_write(
>  {
>  	struct inode		*inode = outfilp->f_mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
> -	xfs_fsize_t		new_size;
>  	int			ioflags = 0;
>  	ssize_t			ret;
>  
> @@ -465,20 +443,12 @@ xfs_file_splice_write(
>  
>  	xfs_ilock(ip, XFS_IOLOCK_EXCL);
>  
> -	new_size = *ppos + count;
> -
> -	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	if (new_size > i_size_read(inode))
> -		ip->i_new_size = new_size;
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -
>  	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
>  
>  	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
>  	if (ret > 0)
>  		XFS_STATS_ADD(xs_write_bytes, ret);
>  
> -	xfs_aio_write_newsize_update(ip, new_size);
>  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
>  	return ret;
>  }
> @@ -673,16 +643,13 @@ xfs_file_aio_write_checks(
>  	struct file		*file,
>  	loff_t			*pos,
>  	size_t			*count,
> -	xfs_fsize_t		*new_sizep,
>  	int			*iolock)
>  {
>  	struct inode		*inode = file->f_mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
> -	xfs_fsize_t		new_size;
>  	int			error = 0;
>  
>  	xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
> -	*new_sizep = 0;
>  restart:
>  	error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
>  	if (error) {
> @@ -697,15 +664,13 @@ restart:
>  	/*
>  	 * If the offset is beyond the size of the file, we need to zero any
>  	 * blocks that fall between the existing EOF and the start of this
> -	 * write. There is no need to issue zeroing if another in-flght IO ends
> -	 * at or before this one If zeronig is needed and we are currently
> -	 * holding the iolock shared, we need to update it to exclusive which
> -	 * involves dropping all locks and relocking to maintain correct locking
> -	 * order. If we do this, restart the function to ensure all checks and
> -	 * values are still valid.
> +	 * write.  If zeroing is needed and we are currently holding the
> +	 * iolock shared, we need to update it to exclusive which involves
> +	 * dropping all locks and relocking to maintain correct locking order.
> +	 * If we do this, restart the function to ensure all checks and values
> +	 * are still valid.
>  	 */
> -	if ((ip->i_new_size && *pos > ip->i_new_size) ||
> -	    (!ip->i_new_size && *pos > i_size_read(inode))) {
> +	if (*pos > i_size_read(inode)) {
>  		if (*iolock == XFS_IOLOCK_SHARED) {
>  			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
>  			*iolock = XFS_IOLOCK_EXCL;
> @@ -714,19 +679,6 @@ restart:
>  		}
>  		error = -xfs_zero_eof(ip, *pos, i_size_read(inode));
>  	}
> -
> -	/*
> -	 * If this IO extends beyond EOF, we may need to update ip->i_new_size.
> -	 * We have already zeroed space beyond EOF (if necessary).  Only update
> -	 * ip->i_new_size if this IO ends beyond any other in-flight writes.
> -	 */
> -	new_size = *pos + *count;
> -	if (new_size > i_size_read(inode)) {
> -		if (new_size > ip->i_new_size)
> -			ip->i_new_size = new_size;
> -		*new_sizep = new_size;
> -	}
> -
>  	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
>  	if (error)
>  		return error;
> @@ -772,7 +724,6 @@ xfs_file_dio_aio_write(
>  	unsigned long		nr_segs,
>  	loff_t			pos,
>  	size_t			ocount,
> -	xfs_fsize_t		*new_size,
>  	int			*iolock)
>  {
>  	struct file		*file = iocb->ki_filp;
> @@ -817,7 +768,7 @@ xfs_file_dio_aio_write(
>  		xfs_rw_ilock(ip, *iolock);
>  	}
>  
> -	ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> +	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
>  	if (ret)
>  		return ret;
>  
> @@ -855,7 +806,6 @@ xfs_file_buffered_aio_write(
>  	unsigned long		nr_segs,
>  	loff_t			pos,
>  	size_t			ocount,
> -	xfs_fsize_t		*new_size,
>  	int			*iolock)
>  {
>  	struct file		*file = iocb->ki_filp;
> @@ -869,7 +819,7 @@ xfs_file_buffered_aio_write(
>  	*iolock = XFS_IOLOCK_EXCL;
>  	xfs_rw_ilock(ip, *iolock);
>  
> -	ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> +	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
>  	if (ret)
>  		return ret;
>  
> @@ -909,7 +859,6 @@ xfs_file_aio_write(
>  	ssize_t			ret;
>  	int			iolock;
>  	size_t			ocount = 0;
> -	xfs_fsize_t		new_size = 0;
>  
>  	XFS_STATS_INC(xs_write_calls);
>  
> @@ -929,10 +878,10 @@ xfs_file_aio_write(
>  
>  	if (unlikely(file->f_flags & O_DIRECT))
>  		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
> -						ocount, &new_size, &iolock);
> +						ocount, &iolock);
>  	else
>  		ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
> -						ocount, &new_size, &iolock);
> +						ocount, &iolock);
>  
>  	if (ret <= 0)
>  		goto out_unlock;
> @@ -953,7 +902,6 @@ xfs_file_aio_write(
>  	}
>  
>  out_unlock:
> -	xfs_aio_write_newsize_update(ip, new_size);
>  	xfs_rw_iunlock(ip, iolock);
>  	return ret;
>  }
> Index: xfs/fs/xfs/xfs_aops.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_aops.c	2011-11-30 12:59:11.669698558 +0100
> +++ xfs/fs/xfs/xfs_aops.c	2011-12-01 08:12:10.946664057 +0100
> @@ -111,8 +111,7 @@ xfs_ioend_new_eof(
>  	xfs_fsize_t		bsize;
>  
>  	bsize = ioend->io_offset + ioend->io_size;
> -	isize = MAX(i_size_read(VFS_I(ip)), ip->i_new_size);
> -	isize = MIN(isize, bsize);
> +	isize = MIN(i_size_read(VFS_I(ip)), bsize);
>  	return isize > ip->i_d.di_size ? isize : 0;
>  }
>  
> @@ -126,11 +125,7 @@ static inline bool xfs_ioend_is_append(s
>  }
>  
>  /*
> - * Update on-disk file size now that data has been written to disk.  The
> - * current in-memory file size is i_size.  If a write is beyond eof i_new_size
> - * will be the intended file size until i_size is updated.  If this write does
> - * not extend all the way to the valid file size then restrict this update to
> - * the end of the write.
> + * Update on-disk file size now that data has been written to disk.
>   *
>   * This function does not block as blocking on the inode lock in IO completion
>   * can lead to IO completion order dependency deadlocks.. If it can't get the
> @@ -1279,6 +1274,15 @@ xfs_end_io_direct_write(
>  	struct xfs_ioend	*ioend = iocb->private;
>  
>  	/*
> +	 * While the generic direct I/O code updates the inode size, it does
> +	 * so only after the end_io handler is called, which means our
> +	 * end_io handler thinks the on-disk size is outside the in-core
> +	 * size.  To prevent this just update it a little bit earlier here.
> +	 */
> +	if (offset + size > i_size_read(ioend->io_inode))
> +		i_size_write(ioend->io_inode, offset + size);
> +
> +	/*
>  	 * blockdev_direct_IO can return an error even after the I/O
>  	 * completion handler was called.  Thus we need to protect
>  	 * against double-freeing.
> @@ -1340,12 +1344,10 @@ xfs_vm_write_failed(
>  
>  	if (to > inode->i_size) {
>  		/*
> -		 * punch out the delalloc blocks we have already allocated. We
> -		 * don't call xfs_setattr() to do this as we may be in the
> -		 * middle of a multi-iovec write and so the vfs inode->i_size
> -		 * will not match the xfs ip->i_size and so it will zero too
> -		 * much. Hence we jus truncate the page cache to zero what is
> -		 * necessary and punch the delalloc blocks directly.
> +		 * Punch out the delalloc blocks we have already allocated.
> +		 *
> +		 * Don't bother with xfs_setattr given that nothing can have
> +		 * it do disk yet as the page is still locked at this point.
	      made    to

updated this 

>  		 */
>  		struct xfs_inode	*ip = XFS_I(inode);
>  		xfs_fileoff_t		start_fsb;
> Index: xfs/fs/xfs/xfs_iget.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_iget.c	2011-11-30 12:59:11.676365190 +0100
> +++ xfs/fs/xfs/xfs_iget.c	2011-11-30 12:59:13.533021797 +0100
> @@ -94,7 +94,6 @@ xfs_inode_alloc(
>  	ip->i_update_core = 0;
>  	ip->i_delayed_blks = 0;
>  	memset(&ip->i_d, 0, sizeof(xfs_icdinode_t));
> -	ip->i_new_size = 0;
>  
>  	return ip;
>  }
> Index: xfs/fs/xfs/xfs_trace.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_trace.h	2011-11-30 12:59:11.673031874 +0100
> +++ xfs/fs/xfs/xfs_trace.h	2011-11-30 12:59:13.536355113 +0100
> @@ -891,7 +891,6 @@ DECLARE_EVENT_CLASS(xfs_file_class,
>  		__field(dev_t, dev)
>  		__field(xfs_ino_t, ino)
>  		__field(xfs_fsize_t, size)
> -		__field(xfs_fsize_t, new_size)
>  		__field(loff_t, offset)
>  		__field(size_t, count)
>  		__field(int, flags)
> @@ -900,17 +899,15 @@ DECLARE_EVENT_CLASS(xfs_file_class,
>  		__entry->dev = VFS_I(ip)->i_sb->s_dev;
>  		__entry->ino = ip->i_ino;
>  		__entry->size = ip->i_d.di_size;
> -		__entry->new_size = ip->i_new_size;
>  		__entry->offset = offset;
>  		__entry->count = count;
>  		__entry->flags = flags;
>  	),
> -	TP_printk("dev %d:%d ino 0x%llx size 0x%llx new_size 0x%llx "
> +	TP_printk("dev %d:%d ino 0x%llx size 0x%llx "
>  		  "offset 0x%llx count 0x%zx ioflags %s",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
>  		  __entry->ino,
>  		  __entry->size,
> -		  __entry->new_size,
>  		  __entry->offset,
>  		  __entry->count,
>  		  __print_flags(__entry->flags, "|", XFS_IO_FLAGS))
> @@ -978,7 +975,6 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
>  		__field(dev_t, dev)
>  		__field(xfs_ino_t, ino)
>  		__field(loff_t, size)
> -		__field(loff_t, new_size)
>  		__field(loff_t, offset)
>  		__field(size_t, count)
>  		__field(int, type)
> @@ -990,7 +986,6 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
>  		__entry->dev = VFS_I(ip)->i_sb->s_dev;
>  		__entry->ino = ip->i_ino;
>  		__entry->size = ip->i_d.di_size;
> -		__entry->new_size = ip->i_new_size;
>  		__entry->offset = offset;
>  		__entry->count = count;
>  		__entry->type = type;
> @@ -998,13 +993,11 @@ DECLARE_EVENT_CLASS(xfs_imap_class,
>  		__entry->startblock = irec ? irec->br_startblock : 0;
>  		__entry->blockcount = irec ? irec->br_blockcount : 0;
>  	),
> -	TP_printk("dev %d:%d ino 0x%llx size 0x%llx new_size 0x%llx "
> -		  "offset 0x%llx count %zd type %s "
> -		  "startoff 0x%llx startblock %lld blockcount 0x%llx",
> +	TP_printk("dev %d:%d ino 0x%llx size 0x%llx offset 0x%llx count %zd "
> +		  "type %s startoff 0x%llx startblock %lld blockcount 0x%llx",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
>  		  __entry->ino,
>  		  __entry->size,
> -		  __entry->new_size,
>  		  __entry->offset,
>  		  __entry->count,
>  		  __print_symbolic(__entry->type, XFS_IO_TYPES),
> @@ -1031,7 +1024,6 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
>  		__field(xfs_ino_t, ino)
>  		__field(loff_t, isize)
>  		__field(loff_t, disize)
> -		__field(loff_t, new_size)
>  		__field(loff_t, offset)
>  		__field(size_t, count)
>  	),
> @@ -1040,17 +1032,15 @@ DECLARE_EVENT_CLASS(xfs_simple_io_class,
>  		__entry->ino = ip->i_ino;
>  		__entry->isize = VFS_I(ip)->i_size;
>  		__entry->disize = ip->i_d.di_size;
> -		__entry->new_size = ip->i_new_size;
>  		__entry->offset = offset;
>  		__entry->count = count;
>  	),
> -	TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx new_size 0x%llx "
> +	TP_printk("dev %d:%d ino 0x%llx isize 0x%llx disize 0x%llx "
>  		  "offset 0x%llx count %zd",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
>  		  __entry->ino,
>  		  __entry->isize,
>  		  __entry->disize,
> -		  __entry->new_size,
>  		  __entry->offset,
>  		  __entry->count)
>  );
> Index: xfs/fs/xfs/xfs_inode.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_inode.h	2011-11-30 12:59:11.679698505 +0100
> +++ xfs/fs/xfs/xfs_inode.h	2011-12-01 08:12:10.839997391 +0100
> @@ -246,8 +246,6 @@ typedef struct xfs_inode {
>  
>  	xfs_icdinode_t		i_d;		/* most of ondisk inode */
>  
> -	xfs_fsize_t		i_new_size;	/* size when write completes */
> -
>  	/* VFS inode */
>  	struct inode		i_vnode;	/* embedded VFS inode */
>  } xfs_inode_t;
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks
  2011-12-18 20:00 ` [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks Christoph Hellwig
@ 2012-01-17 20:18   ` Ben Myers
  2012-01-20 12:51     ` Jeff Liu
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Myers @ 2012-01-17 20:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:13PM -0500, Christoph Hellwig wrote:
> While xfs_iunlock is fine with 0 lockflags the calling conventions are much
> cleaner if xfs_file_aio_write_checks never returns without the iolock held.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.
Reviewed-by: Ben Myers <bpm@sgi.com>

> 
> ---
>  fs/xfs/xfs_file.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_file.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_file.c	2011-12-07 12:46:31.343897882 +0100
> +++ xfs/fs/xfs/xfs_file.c	2011-12-07 12:48:33.309903801 +0100
> @@ -636,7 +636,9 @@ out_lock:
>  /*
>   * Common pre-write limit and setup checks.
>   *
> - * Returns with iolock held according to @iolock.
> + * Called with the iolocked held either shared and exclusive according to
> + * @iolock, and returns with it held.  Might upgrade the iolock to exclusive
> + * if called for a direct write beyond i_size.
>   */
>  STATIC ssize_t
>  xfs_file_aio_write_checks(
> @@ -653,8 +655,7 @@ xfs_file_aio_write_checks(
>  restart:
>  	error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
>  	if (error) {
> -		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
> -		*iolock = 0;
> +		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
>  		return error;
>  	}
>  
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/11] xfs: cleanup xfs_file_aio_write
  2011-12-18 20:00 ` [PATCH 11/11] xfs: cleanup xfs_file_aio_write Christoph Hellwig
@ 2012-01-17 20:42   ` Ben Myers
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Myers @ 2012-01-17 20:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Dec 18, 2011 at 03:00:14PM -0500, Christoph Hellwig wrote:
> With all the size field updates out of the way xfs_file_aio_write can
> be further simplified by pushing all iolock handling into
> xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
> the generic generic_write_sync helper for synchronous writes.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good!
Reviewed-by: Ben Myers <bpm@sgi.com>

> ---
>  fs/xfs/xfs_file.c |   82 ++++++++++++++++++++++++------------------------------
>  1 file changed, 37 insertions(+), 45 deletions(-)
> 
> Index: xfs/fs/xfs/xfs_file.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_file.c	2011-12-07 12:48:33.309903801 +0100
> +++ xfs/fs/xfs/xfs_file.c	2011-12-07 12:48:48.546487926 +0100
> @@ -724,8 +724,7 @@ xfs_file_dio_aio_write(
>  	const struct iovec	*iovp,
>  	unsigned long		nr_segs,
>  	loff_t			pos,
> -	size_t			ocount,
> -	int			*iolock)
> +	size_t			ocount)
>  {
>  	struct file		*file = iocb->ki_filp;
>  	struct address_space	*mapping = file->f_mapping;
> @@ -735,10 +734,10 @@ xfs_file_dio_aio_write(
>  	ssize_t			ret = 0;
>  	size_t			count = ocount;
>  	int			unaligned_io = 0;
> +	int			iolock;
>  	struct xfs_buftarg	*target = XFS_IS_REALTIME_INODE(ip) ?
>  					mp->m_rtdev_targp : mp->m_ddev_targp;
>  
> -	*iolock = 0;
>  	if ((pos & target->bt_smask) || (count & target->bt_smask))
>  		return -XFS_ERROR(EINVAL);
>  
> @@ -753,31 +752,31 @@ xfs_file_dio_aio_write(
>  	 * EOF zeroing cases and fill out the new inode size as appropriate.
>  	 */
>  	if (unaligned_io || mapping->nrpages)
> -		*iolock = XFS_IOLOCK_EXCL;
> +		iolock = XFS_IOLOCK_EXCL;
>  	else
> -		*iolock = XFS_IOLOCK_SHARED;
> -	xfs_rw_ilock(ip, *iolock);
> +		iolock = XFS_IOLOCK_SHARED;
> +	xfs_rw_ilock(ip, iolock);
>  
>  	/*
>  	 * Recheck if there are cached pages that need invalidate after we got
>  	 * the iolock to protect against other threads adding new pages while
>  	 * we were waiting for the iolock.
>  	 */
> -	if (mapping->nrpages && *iolock == XFS_IOLOCK_SHARED) {
> -		xfs_rw_iunlock(ip, *iolock);
> -		*iolock = XFS_IOLOCK_EXCL;
> -		xfs_rw_ilock(ip, *iolock);
> +	if (mapping->nrpages && iolock == XFS_IOLOCK_SHARED) {
> +		xfs_rw_iunlock(ip, iolock);
> +		iolock = XFS_IOLOCK_EXCL;
> +		xfs_rw_ilock(ip, iolock);
>  	}
>  
> -	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
> +	ret = xfs_file_aio_write_checks(file, &pos, &count, &iolock);
>  	if (ret)
> -		return ret;
> +		goto out;
>  
>  	if (mapping->nrpages) {
>  		ret = -xfs_flushinval_pages(ip, (pos & PAGE_CACHE_MASK), -1,
>  							FI_REMAPF_LOCKED);
>  		if (ret)
> -			return ret;
> +			goto out;
>  	}
>  
>  	/*
> @@ -786,15 +785,18 @@ xfs_file_dio_aio_write(
>  	 */
>  	if (unaligned_io)
>  		inode_dio_wait(inode);
> -	else if (*iolock == XFS_IOLOCK_EXCL) {
> +	else if (iolock == XFS_IOLOCK_EXCL) {
>  		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
> -		*iolock = XFS_IOLOCK_SHARED;
> +		iolock = XFS_IOLOCK_SHARED;
>  	}
>  
>  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
>  	ret = generic_file_direct_write(iocb, iovp,
>  			&nr_segs, pos, &iocb->ki_pos, count, ocount);
>  
> +out:
> +	xfs_rw_iunlock(ip, iolock);
> +
>  	/* No fallback to buffered IO on errors for XFS. */
>  	ASSERT(ret < 0 || ret == count);
>  	return ret;
> @@ -806,8 +808,7 @@ xfs_file_buffered_aio_write(
>  	const struct iovec	*iovp,
>  	unsigned long		nr_segs,
>  	loff_t			pos,
> -	size_t			ocount,
> -	int			*iolock)
> +	size_t			ocount)
>  {
>  	struct file		*file = iocb->ki_filp;
>  	struct address_space	*mapping = file->f_mapping;
> @@ -815,14 +816,14 @@ xfs_file_buffered_aio_write(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	ssize_t			ret;
>  	int			enospc = 0;
> +	int			iolock = XFS_IOLOCK_EXCL;
>  	size_t			count = ocount;
>  
> -	*iolock = XFS_IOLOCK_EXCL;
> -	xfs_rw_ilock(ip, *iolock);
> +	xfs_rw_ilock(ip, iolock);
>  
> -	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
> +	ret = xfs_file_aio_write_checks(file, &pos, &count, &iolock);
>  	if (ret)
> -		return ret;
> +		goto out;
>  
>  	/* We can write back this queue in page reclaim */
>  	current->backing_dev_info = mapping->backing_dev_info;
> @@ -836,13 +837,15 @@ write_retry:
>  	 * page locks and retry *once*
>  	 */
>  	if (ret == -ENOSPC && !enospc) {
> -		ret = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
> -		if (ret)
> -			return ret;
>  		enospc = 1;
> -		goto write_retry;
> +		ret = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
> +		if (!ret)
> +			goto write_retry;
>  	}
> +
>  	current->backing_dev_info = NULL;
> +out:
> +	xfs_rw_iunlock(ip, iolock);
>  	return ret;
>  }
>  
> @@ -858,7 +861,6 @@ xfs_file_aio_write(
>  	struct inode		*inode = mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	ssize_t			ret;
> -	int			iolock;
>  	size_t			ocount = 0;
>  
>  	XFS_STATS_INC(xs_write_calls);
> @@ -878,32 +880,22 @@ xfs_file_aio_write(
>  		return -EIO;
>  
>  	if (unlikely(file->f_flags & O_DIRECT))
> -		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
> -						ocount, &iolock);
> +		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos, ocount);
>  	else
>  		ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
> -						ocount, &iolock);
> +						  ocount);
>  
> -	if (ret <= 0)
> -		goto out_unlock;
> +	if (ret > 0) {
> +		ssize_t err;
>  
> -	XFS_STATS_ADD(xs_write_bytes, ret);
> +		XFS_STATS_ADD(xs_write_bytes, ret);
>  
> -	/* Handle various SYNC-type writes */
> -	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
> -		loff_t end = pos + ret - 1;
> -		int error;
> -
> -		xfs_rw_iunlock(ip, iolock);
> -		error = xfs_file_fsync(file, pos, end,
> -				      (file->f_flags & __O_SYNC) ? 0 : 1);
> -		xfs_rw_ilock(ip, iolock);
> -		if (error)
> -			ret = error;
> +		/* Handle various SYNC-type writes */
> +		err = generic_write_sync(file, pos, ret);
> +		if (err < 0)
> +			ret = err;
>  	}
>  
> -out_unlock:
> -	xfs_rw_iunlock(ip, iolock);
>  	return ret;
>  }
>  
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks
  2012-01-17 20:18   ` Ben Myers
@ 2012-01-20 12:51     ` Jeff Liu
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Liu @ 2012-01-20 12:51 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

On 01/18/2012 04:18 AM, Ben Myers wrote:

> On Sun, Dec 18, 2011 at 03:00:13PM -0500, Christoph Hellwig wrote:
>> While xfs_iunlock is fine with 0 lockflags the calling conventions are much
>> cleaner if xfs_file_aio_write_checks never returns without the iolock held.
>>
>> Reviewed-by: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks good.
> Reviewed-by: Ben Myers <bpm@sgi.com>
> 
>>
>> ---
>>  fs/xfs/xfs_file.c |    7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> Index: xfs/fs/xfs/xfs_file.c
>> ===================================================================
>> --- xfs.orig/fs/xfs/xfs_file.c	2011-12-07 12:46:31.343897882 +0100
>> +++ xfs/fs/xfs/xfs_file.c	2011-12-07 12:48:33.309903801 +0100
>> @@ -636,7 +636,9 @@ out_lock:
>>  /*
>>   * Common pre-write limit and setup checks.
>>   *
>> - * Returns with iolock held according to @iolock.
>> + * Called with the iolocked held either shared and exclusive according to
>> + * @iolock, and returns with it held.  Might upgrade the iolock to exclusive
>> + * if called for a direct write beyond i_size.
>>   */
>>  STATIC ssize_t
>>  xfs_file_aio_write_checks(
>> @@ -653,8 +655,7 @@ xfs_file_aio_write_checks(
>>  restart:
>>  	error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
>>  	if (error) {
>> -		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
>> -		*iolock = 0;
>> +		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
>>  		return error;
>>  	}

Haha, I lucky to saw this patch, since I have triggered an Oops on 3.2.0-rc6 without it just now.

FYI, test code:

#define _GNU_SOURCE
#define _LARGEFILE64_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <ctype.h>
#include <errno.h>
#include <linux/falloc.h>

int
main(int argc, char **argv)
{
	off64_t ret = 0;
	char *path;
	int fd;

	if (argc != 2) {
		fprintf(stdout, "Usage: %s filename\n", argv[0]);
		return 1;
	}

	path = argv[1];		
	fd = open(path, O_WRONLY|O_CREAT|O_TRUNC, 0644);
	if (fd < 0) {
		perror("open");
		return fd;
	}

	ret = fallocate(fd, 0, 0, 1);
	if (ret < 0) {
		fprintf(stderr, "fallocate failed due to %s\n", strerror(errno));
		return ret;
	}

	ret = lseek64(fd, (off64_t)2947483650, SEEK_SET);
	if (ret < 0) {
		perror("lseek");
		return ret;
	}

	write(fd, "abc", 3);

	ret = lseek64(fd, (off64_t)2997483650, SEEK_SET);
	if (ret < 0) {
		perror("lseek");
		return ret;
	}

	write(fd, "abc", 3);

	close(fd);

	return ret;
}

[  135.592793] XFS: Assertion failed: lock_flags != 0, file: fs/xfs/xfs_iget.c, line: 652
[  135.592836] ------------[ cut here ]------------
[  135.596010] kernel BUG at fs/xfs/xfs_message.c:101!
[  135.596010] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  135.596010] Modules linked in: xfs(O) binfmt_misc kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle pppoe pppox nfsd exportfs nfs ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 lockd nf_conntrack nf_defrag_ipv4 parport_pc ppdev ip_tables x_tables fscache auth_rpcgss bridge nfs_acl sunrpc stp snd_hda_codec_analog arc4 snd_hda_intel snd_hda_codec snd_hwdep iwl4965 snd_pcm iwl_legacy thinkpad_acpi i915 snd_seq_midi snd_rawmidi pcmcia mac80211 joydev snd_seq_midi_event snd_seq drm_kms_helper btrfs snd_timer snd_seq_device zlib_deflate libcrc32c psmouse btusb yenta_socket pcmcia_rsrc drm cfg80211 bluetooth serio_raw pcmcia_core snd tpm_tis tpm tpm_bios nvram soundcore snd_page_alloc lp i2c_algo_bit parport video ext4 mbcache jbd2 firewire_ohci firewire_core crc_itu_t ahci libahci e1000e
[  135.596010] 
[  135.596010] Pid: 2313, comm: faa Tainted: G        W  O 3.2.0-rc6 #9 LENOVO 7661D43/7661D43
[  135.596010] EIP: 0060:[<f94ae5e8>] EFLAGS: 00010246 CPU: 1
[  135.596010] EIP is at assfail+0x47/0x57 [xfs]
[  135.596010] EAX: 00000060 EBX: e6674400 ECX: 00000000 EDX: 00000073
[  135.596010] ESI: 00000000 EDI: 00000000 EBP: e8d59e34 ESP: e8d59e20
[  135.596010]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  135.596010] Process faa (pid: 2313, ti=e8d58000 task=e919c6c0 task.ti=e8d58000)
[  135.596010] Stack:
[  135.596010]  00000000 f956b082 f956abbb f956a97f 0000028c e8d59e50 f94a3027 e8d59e4c
[  135.596010]  f949c2bd e6674400 00000000 f956171c e8d59e60 f949c2bd afaf0802 00000000
[  135.596010]  e8d59ed4 f949eb38 afaf0802 00000000 00000003 e8d59eb8 e8d59ec4 000021ef
[  135.596010] Call Trace:
[  135.596010]  [<f94a3027>] xfs_iunlock+0xe6/0x2c0 [xfs]
[  135.596010]  [<f949c2bd>] ? xfs_rw_iunlock+0x21/0x5f [xfs]
[  135.596010]  [<f949c2bd>] xfs_rw_iunlock+0x21/0x5f [xfs]
[  135.596010]  [<f949eb38>] xfs_file_aio_write+0x3cf/0x3e8 [xfs]
[  135.596010]  [<c1215c94>] do_sync_write+0xaa/0x12c
[  135.596010]  [<c12edddc>] ? security_file_permission+0x53/0x65
[  135.596010]  [<c12166a5>] ? rw_verify_area+0x1c4/0x1fe
[  135.596010]  [<c1215bea>] ? wait_on_retry_sync_kiocb+0x8c/0x8c
[  135.596010]  [<c1216af6>] vfs_write+0xf5/0x1a3
[  135.596010]  [<c1218dff>] ? fget_light+0x3e/0x130
[  135.596010]  [<c1216e59>] sys_write+0x6c/0xa9
[  135.596010]  [<c18178b4>] syscall_call+0x7/0xb
[  135.596010] Code: 10 89 54 24 0c 89 44 24 08 c7 44 24 04 82 b0 56 f9 c7 04 24 00 00 00 00 e8 ad fc ff ff 83 05 e0 45 59 f9 01 83 15 e4 45 59 f9 00 <0f> 0b 83 05 e8 45 59 f9 01 83 15 ec 45 59 f9 00 55 89 e5 83 ec 
[  135.596010] EIP: [<f94ae5e8>] assfail+0x47/0x57 [xfs] SS:ESP 0068:e8d59e20
[  135.833048] ---[ end trace 4af94dd273c5d0cb ]---

Call chains:
xfs_file_aio_write->xfs_file_buffered_aio_write->xfs_file_aio_write_checks()->generic_write_checks() failed due to -EFBIG, and the iolock was set to *ZERO* at that time. :-P.

Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-01-20 12:51 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-18 20:00 [PATCH 00/11] inode shrink and misc updates V2 Christoph Hellwig
2011-12-18 20:00 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
2012-01-03 21:53   ` Ben Myers
2012-01-04  9:27     ` Christoph Hellwig
2011-12-18 20:00 ` [PATCH 02/11] xfs: cleanup xfs_iomap_eof_align_last_fsb Christoph Hellwig
2012-01-04 20:32   ` Ben Myers
2011-12-18 20:00 ` [PATCH 03/11] xfs: remove the unused dm_attrs structure Christoph Hellwig
2012-01-04 21:13   ` Ben Myers
2011-12-18 20:00 ` [PATCH 04/11] xfs: remove the if_ext_max field in struct xfs_ifork Christoph Hellwig
2012-01-06 16:58   ` Ben Myers
2012-01-16 22:45     ` Ben Myers
2012-01-17 15:16       ` Ben Myers
2012-01-17 17:04         ` Mark Tinguely
2011-12-18 20:00 ` [PATCH 05/11] xfs: make i_flags an unsigned long Christoph Hellwig
2011-12-18 20:00 ` [PATCH 06/11] xfs: replace i_flock with a sleeping bitlock Christoph Hellwig
2012-01-13 21:49   ` Ben Myers
2011-12-18 20:00 ` [PATCH 07/11] xfs: replace i_pin_wait with a bit waitqueue Christoph Hellwig
2012-01-13 22:42   ` Ben Myers
2011-12-18 20:00 ` [PATCH 08/11] xfs: remove the i_size field in struct xfs_inode Christoph Hellwig
2012-01-16 18:32   ` Ben Myers
2012-01-16 19:45     ` Ben Myers
2011-12-18 20:00 ` [PATCH 09/11] xfs: remove the i_new_size " Christoph Hellwig
2011-12-18 22:13   ` Dave Chinner
2012-01-16 22:41   ` Ben Myers
2012-01-17 20:14   ` Ben Myers
2011-12-18 20:00 ` [PATCH 10/11] xfs: always return with the iolock held from xfs_file_aio_write_checks Christoph Hellwig
2012-01-17 20:18   ` Ben Myers
2012-01-20 12:51     ` Jeff Liu
2011-12-18 20:00 ` [PATCH 11/11] xfs: cleanup xfs_file_aio_write Christoph Hellwig
2012-01-17 20:42   ` Ben Myers
  -- strict thread matches above, loose matches on Subject: below --
2011-12-08 15:57 [PATCH 00/11] inode shrink and misc updates Christoph Hellwig
2011-12-08 15:57 ` [PATCH 01/11] xfs: remove xfs_itruncate_data Christoph Hellwig
2011-12-13 21:23   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox