public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] sync improvements
@ 2009-08-27 23:15 Christoph Hellwig
  2009-08-27 23:15 ` [PATCH 1/4] xfs: mark inodes dirty before issuing I/O Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Christoph Hellwig @ 2009-08-27 23:15 UTC (permalink / raw)
  To: xfs

Revisit Dave's prototype to make sync equivalent to freeze, that is make
sure not only we have all data on disk, but also the metadata in the right
place and not requite a log recovery.  That fixes test 182 and should also
help with the frequent grub complaints.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] xfs: mark inodes dirty before issuing I/O
  2009-08-27 23:15 [PATCH 0/4] sync improvements Christoph Hellwig
@ 2009-08-27 23:15 ` Christoph Hellwig
  2009-08-28 23:17   ` Alex Elder
  2009-08-27 23:15 ` [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-08-27 23:15 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-mark-inode-dirty-early --]
[-- Type: text/plain, Size: 2675 bytes --]

From: Dave Chinner <david@fromorbit.com>

To make sure they get properly waited on in sync when I/O is in flight and
we latter need to update the inode size.  Requires a new helper to check if an
ioend structure is beyond the current EOF.


Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2009-08-26 20:06:07.105357325 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2009-08-26 20:08:09.017355358 -0300
@@ -186,19 +186,37 @@ xfs_destroy_ioend(
 }
 
 /*
+ * If the end of the current ioend is beyond the current EOF,
+ * return the new EOF value, otherwise zero.
+ */
+STATIC xfs_fsize_t
+xfs_ioend_new_eof(
+	xfs_ioend_t		*ioend)
+{
+	xfs_inode_t		*ip = XFS_I(ioend->io_inode);
+	xfs_fsize_t		isize;
+	xfs_fsize_t		bsize;
+
+	bsize = ioend->io_offset + ioend->io_size;
+	isize = MAX(ip->i_size, ip->i_new_size);
+	isize = MIN(isize, bsize);
+	return isize > ip->i_d.di_size ? isize : 0;
+}
+
+/*
  * Update on-disk file size now that data has been written to disk.
  * The current in-memory file size is i_size.  If a write is beyond
  * eof i_new_size will be the intended file size until i_size is
  * updated.  If this write does not extend all the way to the valid
  * file size then restrict this update to the end of the write.
  */
+
 STATIC void
 xfs_setfilesize(
 	xfs_ioend_t		*ioend)
 {
 	xfs_inode_t		*ip = XFS_I(ioend->io_inode);
 	xfs_fsize_t		isize;
-	xfs_fsize_t		bsize;
 
 	ASSERT((ip->i_d.di_mode & S_IFMT) == S_IFREG);
 	ASSERT(ioend->io_type != IOMAP_READ);
@@ -206,14 +224,9 @@ xfs_setfilesize(
 	if (unlikely(ioend->io_error))
 		return;
 
-	bsize = ioend->io_offset + ioend->io_size;
-
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-
-	isize = MAX(ip->i_size, ip->i_new_size);
-	isize = MIN(isize, bsize);
-
-	if (ip->i_d.di_size < isize) {
+	isize = xfs_ioend_new_eof(ioend);
+	if (isize) {
 		ip->i_d.di_size = isize;
 		xfs_mark_inode_dirty_sync(ip);
 	}
@@ -403,10 +416,16 @@ xfs_submit_ioend_bio(
 	struct bio	*bio)
 {
 	atomic_inc(&ioend->io_remaining);
-
 	bio->bi_private = ioend;
 	bio->bi_end_io = xfs_end_bio;
 
+	/*
+	 * If the I/O is beyond EOF we mark the inode dirty immediately
+	 * but don't update the inode size until I/O completion.
+	 */
+	if (xfs_ioend_new_eof(ioend))
+		xfs_mark_inode_dirty_sync(XFS_I(ioend->io_inode));
+
 	submit_bio(WRITE, bio);
 	ASSERT(!bio_flagged(bio, BIO_EOPNOTSUPP));
 	bio_put(bio);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log
  2009-08-27 23:15 [PATCH 0/4] sync improvements Christoph Hellwig
  2009-08-27 23:15 ` [PATCH 1/4] xfs: mark inodes dirty before issuing I/O Christoph Hellwig
@ 2009-08-27 23:15 ` Christoph Hellwig
  2009-08-28 23:18   ` Alex Elder
  2009-08-27 23:15 ` [PATCH 3/4] [PATCH 2/5] xfs: cleanup ->sync_fs Christoph Hellwig
  2009-08-27 23:15 ` [PATCH 4/4] [PATCH 5/5] xfs: fix xfs_quiesce_data Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-08-27 23:15 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-xfs_sync_fsdata --]
[-- Type: text/plain, Size: 2378 bytes --]

From: Dave Chinner <david@fromorbit.com>

We want to always cover the log after writing out the superblock, and
in case of a synchronous writeout make sure we actually wait for the
log to be covered.  That way a filesystem that has been sync()ed can
be considered clean by log recovery.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-07-07 16:03:51.871239376 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-07-07 18:24:48.722251281 +0200
@@ -309,11 +309,15 @@ xfs_sync_attr(
 STATIC int
 xfs_commit_dummy_trans(
 	struct xfs_mount	*mp,
-	uint			log_flags)
+	uint			flags)
 {
 	struct xfs_inode	*ip = mp->m_rootip;
 	struct xfs_trans	*tp;
 	int			error;
+	int			log_flags = XFS_LOG_FORCE;
+
+	if (flags & SYNC_WAIT)
+		log_flags |= XFS_LOG_SYNC;
 
 	/*
 	 * Put a dummy transaction in the log to tell recovery
@@ -331,13 +335,12 @@ xfs_commit_dummy_trans(
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 	xfs_trans_ihold(tp, ip);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-	/* XXX(hch): ignoring the error here.. */
 	error = xfs_trans_commit(tp, 0);
-
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
+	/* the log force ensures this transaction is pushed to disk */
 	xfs_log_force(mp, 0, log_flags);
-	return 0;
+	return error;
 }
 
 int
@@ -377,6 +380,7 @@ xfs_sync_fsdata(
 		 */
 		if (XFS_BUF_ISPINNED(bp))
 			xfs_log_force(mp, 0, XFS_LOG_FORCE);
+		xfs_flush_buftarg(mp->m_ddev_targp, 1);
 	}
 
 
@@ -385,7 +389,10 @@ xfs_sync_fsdata(
 	else
 		XFS_BUF_ASYNC(bp);
 
-	return xfs_bwrite(mp, bp);
+	error = xfs_bwrite(mp, bp);
+	if (!error && xfs_log_need_covered(mp))
+		error = xfs_commit_dummy_trans(mp, (flags & SYNC_WAIT));
+	return error;
 
  out_brelse:
 	xfs_buf_relse(bp);
@@ -571,8 +578,6 @@ xfs_sync_worker(
 		/* dgc: errors ignored here */
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
 		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
-		if (xfs_log_need_covered(mp))
-			error = xfs_commit_dummy_trans(mp, XFS_LOG_FORCE);
 	}
 	mp->m_sync_seq++;
 	wake_up(&mp->m_wait_single_sync_task);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4] [PATCH 2/5] xfs: cleanup ->sync_fs
  2009-08-27 23:15 [PATCH 0/4] sync improvements Christoph Hellwig
  2009-08-27 23:15 ` [PATCH 1/4] xfs: mark inodes dirty before issuing I/O Christoph Hellwig
  2009-08-27 23:15 ` [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log Christoph Hellwig
@ 2009-08-27 23:15 ` Christoph Hellwig
  2009-08-28 23:18   ` Alex Elder
  2009-08-27 23:15 ` [PATCH 4/4] [PATCH 5/5] xfs: fix xfs_quiesce_data Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-08-27 23:15 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-cleanup-sync_fs --]
[-- Type: text/plain, Size: 2867 bytes --]

Sort out ->sync_fs to not perform a superblock writeback for the wait = 0 case
as that is just an optional first pass and the superblock will be written back
properly in the next call with wait = 1.  Instead perform an opportunistic
quota writeback to have less work later.  Also remove the freeze special case
as we do a proper wait = 1 call in the freeze code anyway.

Also rename the function to xfs_fs_sync_fs to match the normal naming
convention, update comments and avoid calling into the laptop_mode logic on
an error.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_super.c	2009-08-26 20:13:54.609362865 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_super.c	2009-08-26 20:18:36.065357266 -0300
@@ -1144,7 +1144,7 @@ xfs_fs_put_super(
 }
 
 STATIC int
-xfs_fs_sync_super(
+xfs_fs_sync_fs(
 	struct super_block	*sb,
 	int			wait)
 {
@@ -1152,23 +1152,23 @@ xfs_fs_sync_super(
 	int			error;
 
 	/*
-	 * Treat a sync operation like a freeze.  This is to work
-	 * around a race in sync_inodes() which works in two phases
-	 * - an asynchronous flush, which can write out an inode
-	 * without waiting for file size updates to complete, and a
-	 * synchronous flush, which wont do anything because the
-	 * async flush removed the inode's dirty flag.  Also
-	 * sync_inodes() will not see any files that just have
-	 * outstanding transactions to be flushed because we don't
-	 * dirty the Linux inode until after the transaction I/O
-	 * completes.
+	 * Not much we can do for the first async pass.  Writing out the
+	 * superblock would be counter-productive as we are going to redirty
+	 * when writing out other data and metadata (and writing out a single
+	 * block is quite fast anyway).
+	 *
+	 * Try to asynchronously kick off quota syncing at least.
 	 */
-	if (wait || unlikely(sb->s_frozen == SB_FREEZE_WRITE))
-		error = xfs_quiesce_data(mp);
-	else
-		error = xfs_sync_fsdata(mp, 0);
+	if (!wait) {
+		xfs_qm_sync(mp, SYNC_TRYLOCK);
+		return 0;
+	}
+
+	error = xfs_quiesce_data(mp);
+	if (error)
+		return -error;
 
-	if (unlikely(laptop_mode)) {
+	if (laptop_mode) {
 		int	prev_sync_seq = mp->m_sync_seq;
 
 		/*
@@ -1187,7 +1187,7 @@ xfs_fs_sync_super(
 				mp->m_sync_seq != prev_sync_seq);
 	}
 
-	return -error;
+	return 0;
 }
 
 STATIC int
@@ -1561,7 +1561,7 @@ static struct super_operations xfs_super
 	.write_inode		= xfs_fs_write_inode,
 	.clear_inode		= xfs_fs_clear_inode,
 	.put_super		= xfs_fs_put_super,
-	.sync_fs		= xfs_fs_sync_super,
+	.sync_fs		= xfs_fs_sync_fs,
 	.freeze_fs		= xfs_fs_freeze,
 	.statfs			= xfs_fs_statfs,
 	.remount_fs		= xfs_fs_remount,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] [PATCH 5/5] xfs: fix xfs_quiesce_data
  2009-08-27 23:15 [PATCH 0/4] sync improvements Christoph Hellwig
                   ` (2 preceding siblings ...)
  2009-08-27 23:15 ` [PATCH 3/4] [PATCH 2/5] xfs: cleanup ->sync_fs Christoph Hellwig
@ 2009-08-27 23:15 ` Christoph Hellwig
  2009-08-28 23:18   ` Alex Elder
  3 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-08-27 23:15 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-xfs_quiesce_data --]
[-- Type: text/plain, Size: 1359 bytes --]

From: Dave Chinner <david@fromorbit.com>

We need to do a synchronous xfs_sync_fsdata to make sure the superblock
actually is on disk when we return.

Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular
flag is never checked.

Move xfs_filestream_flush call later [hch: why?  seems unrelated].


Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-27 20:06:39.889355294 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-27 20:08:01.169357854 -0300
@@ -426,14 +426,16 @@ xfs_quiesce_data(
 	/* push non-blocking */
 	xfs_sync_data(mp, 0);
 	xfs_qm_sync(mp, SYNC_TRYLOCK);
-	xfs_filestream_flush(mp);
 
-	/* push and block */
+	/* push and block till complete */
 	xfs_sync_data(mp, SYNC_WAIT);
 	xfs_qm_sync(mp, SYNC_WAIT);
 
+	/* drop inode references pinned by filestreams */
+	xfs_filestream_flush(mp);
+
 	/* write superblock and hoover up shutdown errors */
-	error = xfs_sync_fsdata(mp, 0);
+	error = xfs_sync_fsdata(mp, SYNC_WAIT);
 
 	/* flush data-only devices */
 	if (mp->m_rtdev_targp)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/4] xfs: mark inodes dirty before issuing I/O
  2009-08-27 23:15 ` [PATCH 1/4] xfs: mark inodes dirty before issuing I/O Christoph Hellwig
@ 2009-08-28 23:17   ` Alex Elder
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Elder @ 2009-08-28 23:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig wrote:
> From: Dave Chinner <david@fromorbit.com>
> 
> To make sure they get properly waited on in sync when I/O is in flight and
> we latter need to update the inode size.  Requires a new helper to check if an
> ioend structure is beyond the current EOF.
> 
> 
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2009-08-26 20:06:07.105357325 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2009-08-26 20:08:09.017355358 -0300
> @@ -186,19 +186,37 @@ xfs_destroy_ioend(
>  }
> 
>  /*
> + * If the end of the current ioend is beyond the current EOF,
> + * return the new EOF value, otherwise zero.
> + */
> +STATIC xfs_fsize_t
> +xfs_ioend_new_eof(
> +	xfs_ioend_t		*ioend)
> +{
> +	xfs_inode_t		*ip = XFS_I(ioend->io_inode);
> +	xfs_fsize_t		isize;
> +	xfs_fsize_t		bsize;
> +
> +	bsize = ioend->io_offset + ioend->io_size;
> +	isize = MAX(ip->i_size, ip->i_new_size);
> +	isize = MIN(isize, bsize);
> +	return isize > ip->i_d.di_size ? isize : 0;
> +}
> +
> +/*
>   * Update on-disk file size now that data has been written to disk.
>   * The current in-memory file size is i_size.  If a write is beyond
>   * eof i_new_size will be the intended file size until i_size is
>   * updated.  If this write does not extend all the way to the valid
>   * file size then restrict this update to the end of the write.
>   */
> +
>  STATIC void
>  xfs_setfilesize(
>  	xfs_ioend_t		*ioend)
>  {
>  	xfs_inode_t		*ip = XFS_I(ioend->io_inode);
>  	xfs_fsize_t		isize;
> -	xfs_fsize_t		bsize;
> 
>  	ASSERT((ip->i_d.di_mode & S_IFMT) == S_IFREG);
>  	ASSERT(ioend->io_type != IOMAP_READ);
> @@ -206,14 +224,9 @@ xfs_setfilesize(
>  	if (unlikely(ioend->io_error))
>  		return;
> 
> -	bsize = ioend->io_offset + ioend->io_size;
> -
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -
> -	isize = MAX(ip->i_size, ip->i_new_size);
> -	isize = MIN(isize, bsize);
> -
> -	if (ip->i_d.di_size < isize) {
> +	isize = xfs_ioend_new_eof(ioend);
> +	if (isize) {
>  		ip->i_d.di_size = isize;
>  		xfs_mark_inode_dirty_sync(ip);
>  	}
> @@ -403,10 +416,16 @@ xfs_submit_ioend_bio(
>  	struct bio	*bio)
>  {
>  	atomic_inc(&ioend->io_remaining);
> -
>  	bio->bi_private = ioend;
>  	bio->bi_end_io = xfs_end_bio;
> 
> +	/*
> +	 * If the I/O is beyond EOF we mark the inode dirty immediately
> +	 * but don't update the inode size until I/O completion.
> +	 */
> +	if (xfs_ioend_new_eof(ioend))
> +		xfs_mark_inode_dirty_sync(XFS_I(ioend->io_inode));
> +
>  	submit_bio(WRITE, bio);
>  	ASSERT(!bio_flagged(bio, BIO_EOPNOTSUPP));
>  	bio_put(bio);
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log
  2009-08-27 23:15 ` [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log Christoph Hellwig
@ 2009-08-28 23:18   ` Alex Elder
  2009-09-03 15:45     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Elder @ 2009-08-28 23:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig wrote:
> From: Dave Chinner <david@fromorbit.com>
> 
> We want to always cover the log after writing out the superblock, and
> in case of a synchronous writeout make sure we actually wait for the
> log to be covered.  That way a filesystem that has been sync()ed can
> be considered clean by log recovery.
> 
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>



Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>



> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-07-07 16:03:51.871239376 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-07-07 18:24:48.722251281 +0200
> @@ -309,11 +309,15 @@ xfs_sync_attr(
>  STATIC int
>  xfs_commit_dummy_trans(
>  	struct xfs_mount	*mp,
> -	uint			log_flags)
> +	uint			flags)
>  {
>  	struct xfs_inode	*ip = mp->m_rootip;
>  	struct xfs_trans	*tp;
>  	int			error;
> +	int			log_flags = XFS_LOG_FORCE;
> +
> +	if (flags & SYNC_WAIT)
> +		log_flags |= XFS_LOG_SYNC;
> 
>  	/*
>  	 * Put a dummy transaction in the log to tell recovery
> @@ -331,13 +335,12 @@ xfs_commit_dummy_trans(
>  	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ihold(tp, ip);
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> -	/* XXX(hch): ignoring the error here.. */
>  	error = xfs_trans_commit(tp, 0);
> -
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> 
> +	/* the log force ensures this transaction is pushed to disk */
>  	xfs_log_force(mp, 0, log_flags);
> -	return 0;
> +	return error;
>  }
> 
>  int
> @@ -377,6 +380,7 @@ xfs_sync_fsdata(
>  		 */
>  		if (XFS_BUF_ISPINNED(bp))
>  			xfs_log_force(mp, 0, XFS_LOG_FORCE);
> +		xfs_flush_buftarg(mp->m_ddev_targp, 1);
>  	}
> 
> 
> @@ -385,7 +389,10 @@ xfs_sync_fsdata(
>  	else
>  		XFS_BUF_ASYNC(bp);
> 
> -	return xfs_bwrite(mp, bp);
> +	error = xfs_bwrite(mp, bp);
> +	if (!error && xfs_log_need_covered(mp))
> +		error = xfs_commit_dummy_trans(mp, (flags & SYNC_WAIT));
> +	return error;
> 
>   out_brelse:
>  	xfs_buf_relse(bp);
> @@ -571,8 +578,6 @@ xfs_sync_worker(
>  		/* dgc: errors ignored here */
>  		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
>  		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
> -		if (xfs_log_need_covered(mp))
> -			error = xfs_commit_dummy_trans(mp, XFS_LOG_FORCE);
>  	}
>  	mp->m_sync_seq++;
>  	wake_up(&mp->m_wait_single_sync_task);
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 3/4] [PATCH 2/5] xfs: cleanup ->sync_fs
  2009-08-27 23:15 ` [PATCH 3/4] [PATCH 2/5] xfs: cleanup ->sync_fs Christoph Hellwig
@ 2009-08-28 23:18   ` Alex Elder
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Elder @ 2009-08-28 23:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig wrote:
> Sort out ->sync_fs to not perform a superblock writeback for the wait = 0 case
> as that is just an optional first pass and the superblock will be written back
> properly in the next call with wait = 1.  Instead perform an opportunistic
> quota writeback to have less work later.  Also remove the freeze special case
> as we do a proper wait = 1 call in the freeze code anyway.
> 
> Also rename the function to xfs_fs_sync_fs to match the normal naming
> convention, update comments and avoid calling into the laptop_mode logic on
> an error.
> 
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>
 
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_super.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_super.c	2009-08-26 20:13:54.609362865 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_super.c	2009-08-26 20:18:36.065357266 -0300
> @@ -1144,7 +1144,7 @@ xfs_fs_put_super(
>  }
> 
>  STATIC int
> -xfs_fs_sync_super(
> +xfs_fs_sync_fs(
>  	struct super_block	*sb,
>  	int			wait)
>  {
> @@ -1152,23 +1152,23 @@ xfs_fs_sync_super(
>  	int			error;
> 
>  	/*
> -	 * Treat a sync operation like a freeze.  This is to work
> -	 * around a race in sync_inodes() which works in two phases
> -	 * - an asynchronous flush, which can write out an inode
> -	 * without waiting for file size updates to complete, and a
> -	 * synchronous flush, which wont do anything because the
> -	 * async flush removed the inode's dirty flag.  Also
> -	 * sync_inodes() will not see any files that just have
> -	 * outstanding transactions to be flushed because we don't
> -	 * dirty the Linux inode until after the transaction I/O
> -	 * completes.
> +	 * Not much we can do for the first async pass.  Writing out the
> +	 * superblock would be counter-productive as we are going to redirty
> +	 * when writing out other data and metadata (and writing out a single
> +	 * block is quite fast anyway).
> +	 *
> +	 * Try to asynchronously kick off quota syncing at least.
>  	 */
> -	if (wait || unlikely(sb->s_frozen == SB_FREEZE_WRITE))
> -		error = xfs_quiesce_data(mp);
> -	else
> -		error = xfs_sync_fsdata(mp, 0);
> +	if (!wait) {
> +		xfs_qm_sync(mp, SYNC_TRYLOCK);
> +		return 0;
> +	}
> +
> +	error = xfs_quiesce_data(mp);
> +	if (error)
> +		return -error;
> 
> -	if (unlikely(laptop_mode)) {
> +	if (laptop_mode) {
>  		int	prev_sync_seq = mp->m_sync_seq;
> 
>  		/*
> @@ -1187,7 +1187,7 @@ xfs_fs_sync_super(
>  				mp->m_sync_seq != prev_sync_seq);
>  	}
> 
> -	return -error;
> +	return 0;
>  }
> 
>  STATIC int
> @@ -1561,7 +1561,7 @@ static struct super_operations xfs_super
>  	.write_inode		= xfs_fs_write_inode,
>  	.clear_inode		= xfs_fs_clear_inode,
>  	.put_super		= xfs_fs_put_super,
> -	.sync_fs		= xfs_fs_sync_super,
> +	.sync_fs		= xfs_fs_sync_fs,
>  	.freeze_fs		= xfs_fs_freeze,
>  	.statfs			= xfs_fs_statfs,
>  	.remount_fs		= xfs_fs_remount,
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 4/4] [PATCH 5/5] xfs: fix xfs_quiesce_data
  2009-08-27 23:15 ` [PATCH 4/4] [PATCH 5/5] xfs: fix xfs_quiesce_data Christoph Hellwig
@ 2009-08-28 23:18   ` Alex Elder
  2009-09-14  0:37     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Elder @ 2009-08-28 23:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig wrote:
> From: Dave Chinner <david@fromorbit.com>
> 
> We need to do a synchronous xfs_sync_fsdata to make sure the superblock
> actually is on disk when we return.
> 
> Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular
> flag is never checked.
> 
> Move xfs_filestream_flush call later [hch: why?  seems unrelated].

I concur with your question.  Why not release the inode references early?

> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-08-27 20:06:39.889355294 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2009-08-27 20:08:01.169357854 -0300
> @@ -426,14 +426,16 @@ xfs_quiesce_data(
>  	/* push non-blocking */
>  	xfs_sync_data(mp, 0);
>  	xfs_qm_sync(mp, SYNC_TRYLOCK);
> -	xfs_filestream_flush(mp);
> 
> -	/* push and block */
> +	/* push and block till complete */
>  	xfs_sync_data(mp, SYNC_WAIT);
>  	xfs_qm_sync(mp, SYNC_WAIT);
> 
> +	/* drop inode references pinned by filestreams */
> +	xfs_filestream_flush(mp);
> +
>  	/* write superblock and hoover up shutdown errors */
> -	error = xfs_sync_fsdata(mp, 0);
> +	error = xfs_sync_fsdata(mp, SYNC_WAIT);
> 
>  	/* flush data-only devices */
>  	if (mp->m_rtdev_targp)
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log
  2009-08-28 23:18   ` Alex Elder
@ 2009-09-03 15:45     ` Christoph Hellwig
  2009-09-11 19:29       ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-09-03 15:45 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs


FYI: I found some nasty deadlock in this on a large machine, please
hold back until I've sorted it out.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log
  2009-09-03 15:45     ` Christoph Hellwig
@ 2009-09-11 19:29       ` Christoph Hellwig
  2009-09-12  3:32         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-09-11 19:29 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Thu, Sep 03, 2009 at 11:45:52AM -0400, Christoph Hellwig wrote:
> 
> FYI: I found some nasty deadlock in this on a large machine, please
> hold back until I've sorted it out.

Turns out it's the following:

Thread A is in xfs_sync_fsdata from sys_sync, flushing the workqueues while
holding b_sema of the superblock:

[78901.232282] Call Trace:
[78901.232282]  [<c08700b5>] schedule_timeout+0x155/0x190
[78901.232282]  [<c086f4f1>] wait_for_common+0x101/0x120
[78901.232282]  [<c086f5a2>] wait_for_completion+0x12/0x20
[78901.232282]  [<c01672ac>] flush_cpu_workqueue+0x3c/0x70
[78901.232282]  [<c01679ae>] flush_workqueue+0x7e/0xa0
[78901.232282]  [<c04ab409>] xfs_flush_buftarg+0x19/0x170
[78901.232282]  [<c04fe9c8>] xfs_sync_fsdata+0xb8/0x150
[78901.232282]  [<c04fef55>] xfs_quiesce_data+0x45/0x70
[78901.232282]  [<c04d8810>] xfs_fs_sync_fs+0x20/0xd0
[78901.232282]  [<c0206539>] __sync_filesystem+0x39/0x60
[78901.232282]  [<c020663b>] sync_filesystems+0xdb/0x110
[78901.232282]  [<c02066bb>] sys_sync+0x1b/0x40


This causes a wakeup of xfsconvertd
performing outstanding unwritten extent conversions:

[32160.551805] Call Trace:
[32160.553843]  [<c08700b5>] schedule_timeout+0x155/0x190
[32160.556965]  [<c0870f80>] __down+0x50/0x80
[32160.557838]  [<c01703ae>] down+0x3e/0x40
[32160.559675]  [<c04aa6a2>] xfs_buf_lock+0x32/0xe0
[32160.560795]  [<c0495e15>] xfs_getsb+0x45/0x90
[32160.561700]  [<c049ebf1>] xfs_trans_getsb+0x91/0x180
[32160.562723]  [<c049c0f5>] xfs_trans_apply_sb_deltas+0x15/0x450
[32160.564995]  [<c049c611>] _xfs_trans_commit+0xe1/0x410
[32160.570459]  [<c04869ec>] xfs_iomap_write_unwritten+0x1cc/0x300
[32160.571678]  [<c04a7da2>] xfs_end_bio_unwritten+0x62/0x70
[32160.573007]  [<c0166c7d>] worker_thread+0x18d/0x280
[32160.577650]  [<c0166af0>] ? worker_thread+0x0/0x280
[32160.578666]  [<c016b3ec>] kthread+0x7c/0x90

Which we already hold in the Thread A.

I don't really see why we have to do these waits at all - xfsdatad and
xfsconvertd are for data I/O completion and not buffers, and we already
track their completion for data integrity syncs using the per-inode
iocount that we wait for during the data writeout.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log
  2009-09-11 19:29       ` Christoph Hellwig
@ 2009-09-12  3:32         ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2009-09-12  3:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, Alex Elder

On Fri, Sep 11, 2009 at 03:29:04PM -0400, Christoph Hellwig wrote:
> On Thu, Sep 03, 2009 at 11:45:52AM -0400, Christoph Hellwig wrote:
> > 
> > FYI: I found some nasty deadlock in this on a large machine, please
> > hold back until I've sorted it out.
> 
> Turns out it's the following:
> 
> Thread A is in xfs_sync_fsdata from sys_sync, flushing the workqueues while
> holding b_sema of the superblock:
> 
> [78901.232282] Call Trace:
> [78901.232282]  [<c08700b5>] schedule_timeout+0x155/0x190
> [78901.232282]  [<c086f4f1>] wait_for_common+0x101/0x120
> [78901.232282]  [<c086f5a2>] wait_for_completion+0x12/0x20
> [78901.232282]  [<c01672ac>] flush_cpu_workqueue+0x3c/0x70
> [78901.232282]  [<c01679ae>] flush_workqueue+0x7e/0xa0
> [78901.232282]  [<c04ab409>] xfs_flush_buftarg+0x19/0x170
> [78901.232282]  [<c04fe9c8>] xfs_sync_fsdata+0xb8/0x150
> [78901.232282]  [<c04fef55>] xfs_quiesce_data+0x45/0x70
> [78901.232282]  [<c04d8810>] xfs_fs_sync_fs+0x20/0xd0
> [78901.232282]  [<c0206539>] __sync_filesystem+0x39/0x60
> [78901.232282]  [<c020663b>] sync_filesystems+0xdb/0x110
> [78901.232282]  [<c02066bb>] sys_sync+0x1b/0x40
> 
> 
> This causes a wakeup of xfsconvertd
> performing outstanding unwritten extent conversions:
> 
> [32160.551805] Call Trace:
> [32160.553843]  [<c08700b5>] schedule_timeout+0x155/0x190
> [32160.556965]  [<c0870f80>] __down+0x50/0x80
> [32160.557838]  [<c01703ae>] down+0x3e/0x40
> [32160.559675]  [<c04aa6a2>] xfs_buf_lock+0x32/0xe0
> [32160.560795]  [<c0495e15>] xfs_getsb+0x45/0x90
> [32160.561700]  [<c049ebf1>] xfs_trans_getsb+0x91/0x180
> [32160.562723]  [<c049c0f5>] xfs_trans_apply_sb_deltas+0x15/0x450
> [32160.564995]  [<c049c611>] _xfs_trans_commit+0xe1/0x410
> [32160.570459]  [<c04869ec>] xfs_iomap_write_unwritten+0x1cc/0x300
> [32160.571678]  [<c04a7da2>] xfs_end_bio_unwritten+0x62/0x70
> [32160.573007]  [<c0166c7d>] worker_thread+0x18d/0x280
> [32160.577650]  [<c0166af0>] ? worker_thread+0x0/0x280
> [32160.578666]  [<c016b3ec>] kthread+0x7c/0x90
> 
> Which we already hold in the Thread A.
> 
> I don't really see why we have to do these waits at all - xfsdatad and
> xfsconvertd are for data I/O completion and not buffers, and we already
> track their completion for data integrity syncs using the per-inode
> iocount that we wait for during the data writeout.

Basically the log covering code should only do anything if the
filesystem is otherwise idle - if a sync is running with concurrent
changes then we're not going to be able to cover the log, nor do we
need to because the concurrent transactions have the same effect as
covering the log - writing another record that ensures the log head
and tail are up to date on disk.

The issue here is that some other data IO completion not covered
by the sync() call is running a new transaction that modifies the
superblock, and it can't get the lock. I'd suggest that the
xfs_flush_buftarg() cal needs to be moved until after the superblock
write but before the cover check. That way the superblock will be
unlocked (due to IO completion) and the above xfsconvertd stack will
make progress and prevent the deadlock.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] [PATCH 5/5] xfs: fix xfs_quiesce_data
  2009-08-28 23:18   ` Alex Elder
@ 2009-09-14  0:37     ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2009-09-14  0:37 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs

On Fri, Aug 28, 2009 at 06:18:45PM -0500, Alex Elder wrote:
> Christoph Hellwig wrote:
> > From: Dave Chinner <david@fromorbit.com>
> > 
> > We need to do a synchronous xfs_sync_fsdata to make sure the superblock
> > actually is on disk when we return.
> > 
> > Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular
> > flag is never checked.
> > 
> > Move xfs_filestream_flush call later [hch: why?  seems unrelated].
> 
> I concur with your question.  Why not release the inode references early?

Because if you drop the filestreams references before you write the
data, the data doesn't get put where the filestreams allocator
decided it should go when it created the reference....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log
  2009-09-16 13:18 [PATCH 0/4] sync fixes again Christoph Hellwig
@ 2009-09-16 13:18 ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2009-09-16 13:18 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-xfs_sync_fsdata --]
[-- Type: text/plain, Size: 2469 bytes --]

From: Dave Chinner <david@fromorbit.com>

We want to always cover the log after writing out the superblock, and
in case of a synchronous writeout make sure we actually wait for the
log to be covered.  That way a filesystem that has been sync()ed can
be considered clean by log recovery.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>

Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c	2009-09-16 10:04:30.869278510 -0300
+++ xfs/fs/xfs/linux-2.6/xfs_sync.c	2009-09-16 10:04:41.193004640 -0300
@@ -309,11 +309,15 @@ xfs_sync_attr(
 STATIC int
 xfs_commit_dummy_trans(
 	struct xfs_mount	*mp,
-	uint			log_flags)
+	uint			flags)
 {
 	struct xfs_inode	*ip = mp->m_rootip;
 	struct xfs_trans	*tp;
 	int			error;
+	int			log_flags = XFS_LOG_FORCE;
+
+	if (flags & SYNC_WAIT)
+		log_flags |= XFS_LOG_SYNC;
 
 	/*
 	 * Put a dummy transaction in the log to tell recovery
@@ -331,16 +335,15 @@ xfs_commit_dummy_trans(
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 	xfs_trans_ihold(tp, ip);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-	/* XXX(hch): ignoring the error here.. */
 	error = xfs_trans_commit(tp, 0);
-
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
+	/* the log force ensures this transaction is pushed to disk */
 	xfs_log_force(mp, 0, log_flags);
-	return 0;
+	return error;
 }
 
-int
+int
 xfs_sync_fsdata(
 	struct xfs_mount	*mp,
 	int			flags)
@@ -385,7 +388,20 @@ xfs_sync_fsdata(
 	else
 		XFS_BUF_ASYNC(bp);
 
-	return xfs_bwrite(mp, bp);
+	error = xfs_bwrite(mp, bp);
+	if (error)
+		return error;
+
+	/*
+	 * If this is a data integrity sync make sure all pending buffers
+	 * are flushed out for the log coverage check below.
+	 */
+	if (flags & SYNC_WAIT)
+		xfs_flush_buftarg(mp->m_ddev_targp, 1);
+
+	if (xfs_log_need_covered(mp))
+		error = xfs_commit_dummy_trans(mp, flags);
+	return error;
 
  out_brelse:
 	xfs_buf_relse(bp);
@@ -572,8 +588,6 @@ xfs_sync_worker(
 		/* dgc: errors ignored here */
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
 		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
-		if (xfs_log_need_covered(mp))
-			error = xfs_commit_dummy_trans(mp, XFS_LOG_FORCE);
 	}
 	mp->m_sync_seq++;
 	wake_up(&mp->m_wait_single_sync_task);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-09-16 13:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-27 23:15 [PATCH 0/4] sync improvements Christoph Hellwig
2009-08-27 23:15 ` [PATCH 1/4] xfs: mark inodes dirty before issuing I/O Christoph Hellwig
2009-08-28 23:17   ` Alex Elder
2009-08-27 23:15 ` [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log Christoph Hellwig
2009-08-28 23:18   ` Alex Elder
2009-09-03 15:45     ` Christoph Hellwig
2009-09-11 19:29       ` Christoph Hellwig
2009-09-12  3:32         ` Dave Chinner
2009-08-27 23:15 ` [PATCH 3/4] [PATCH 2/5] xfs: cleanup ->sync_fs Christoph Hellwig
2009-08-28 23:18   ` Alex Elder
2009-08-27 23:15 ` [PATCH 4/4] [PATCH 5/5] xfs: fix xfs_quiesce_data Christoph Hellwig
2009-08-28 23:18   ` Alex Elder
2009-09-14  0:37     ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2009-09-16 13:18 [PATCH 0/4] sync fixes again Christoph Hellwig
2009-09-16 13:18 ` [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox