[PATCH 0/2] make sure to always update the inode size on umount

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] make sure to always update the inode size on umount
@ 2011-08-27  5:57 Christoph Hellwig
  2011-08-27  5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig
  2011-08-27  5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2011-08-27  5:57 UTC (permalink / raw)
  To: xfs

This series addresses an issue where XFS might not actually flush the
inode size if we wrote out the inode data during umount, which was
recently reported on #xfs on irc.  (If the reported is active on the
list please chime in so that I can add your reported-by tag!).

The first patch is the guts of the fix, and the second fixes various
issues with ->write_inode that I found during the audit.  To me patch
1 is a clear Linux 3.1 candidate, and I would tend to nominate patch 2
as well.

I think we also have some issue with the VFS writeback code not handling
inode redirtying from the I/O completion handler nicely, but I'll send
another patch for that.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount
  2011-08-27  5:57 [PATCH 0/2] make sure to always update the inode size on umount Christoph Hellwig
@ 2011-08-27  5:57 ` Christoph Hellwig
  2011-08-30  6:24   ` Dave Chinner
  2011-08-27  5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2011-08-27  5:57 UTC (permalink / raw)
  To: xfs

During umount we do not add a dirty inode to the lru and wait for it to
become clean first, but force writeback of data and metadata with
I_WILL_FREE set.  Currently there is no way for XFS to detect that the
inode has been redirtied for metadata operations, as we skip the
mark_inode_dirty call during teardown.  Fix this by setting i_update_core
nanually in that case, so that the inode gets flushed during inode reclaim.

Alternatively we could enable calling mark_inode_dirty for inodes in
I_WILL_FREE state, and let the VFS dirty tracking handle this.  I decided
against this as we will get better I/O patterns from reclaim compared to
the synchronous writeout in write_inode_now, and always marking the inode
dirty in some way from xfs_mark_inode_dirty is a better safetly net in
either case.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_iops.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iops.c	2011-08-26 12:31:19.090631739 +0200
+++ linux-2.6/fs/xfs/xfs_iops.c	2011-08-26 12:35:43.692531800 +0200
@@ -70,9 +70,8 @@ xfs_synchronize_times(
 }

 /*
- * If the linux inode is valid, mark it dirty.
- * Used when committing a dirty inode into a transaction so that
- * the inode will get written back by the linux code
+ * If the linux inode is valid, mark it dirty, else mark the dirty state
+ * in the XFS inode to make sure we pick it up when reclaiming the inode.
  */
 void
 xfs_mark_inode_dirty_sync(
@@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync(

 	if (!(inode->i_state & (I_WILL_FREE|I_FREEING)))
 		mark_inode_dirty_sync(inode);
+	else {
+		barrier();
+		ip->i_update_core = 1;
+	}
 }

 void
@@ -92,6 +95,11 @@ xfs_mark_inode_dirty(

 	if (!(inode->i_state & (I_WILL_FREE|I_FREEING)))
 		mark_inode_dirty(inode);
+	else {
+		barrier();
+		ip->i_update_core = 1;
+	}
+
 }

 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount
  2011-08-27  5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig
@ 2011-08-30  6:24   ` Dave Chinner
  2011-08-30  6:39     ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2011-08-30  6:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sat, Aug 27, 2011 at 01:57:44AM -0400, Christoph Hellwig wrote:
> During umount we do not add a dirty inode to the lru and wait for it to
> become clean first, but force writeback of data and metadata with
> I_WILL_FREE set.  Currently there is no way for XFS to detect that the
> inode has been redirtied for metadata operations, as we skip the
> mark_inode_dirty call during teardown.  Fix this by setting i_update_core
> nanually in that case, so that the inode gets flushed during inode reclaim.
> 
> Alternatively we could enable calling mark_inode_dirty for inodes in
> I_WILL_FREE state, and let the VFS dirty tracking handle this.  I decided
> against this as we will get better I/O patterns from reclaim compared to
> the synchronous writeout in write_inode_now, and always marking the inode
> dirty in some way from xfs_mark_inode_dirty is a better safetly net in
> either case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/xfs/xfs_iops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_iops.c	2011-08-26 12:31:19.090631739 +0200
> +++ linux-2.6/fs/xfs/xfs_iops.c	2011-08-26 12:35:43.692531800 +0200
> @@ -70,9 +70,8 @@ xfs_synchronize_times(
>  }
>  
>  /*
> - * If the linux inode is valid, mark it dirty.
> - * Used when committing a dirty inode into a transaction so that
> - * the inode will get written back by the linux code
> + * If the linux inode is valid, mark it dirty, else mark the dirty state
> + * in the XFS inode to make sure we pick it up when reclaiming the inode.
>   */
>  void
>  xfs_mark_inode_dirty_sync(
> @@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync(
>  
>  	if (!(inode->i_state & (I_WILL_FREE|I_FREEING)))
>  		mark_inode_dirty_sync(inode);
> +	else {
> +		barrier();
> +		ip->i_update_core = 1;
> +	}
>  }

Why the barrier()? Isn't that just a compiler barrier? If you are
worried about catching the update vs clearing it in transaction
commit, shouldn't that use smp_mb() instead (in both places)?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount
  2011-08-30  6:24   ` Dave Chinner
@ 2011-08-30  6:39     ` Christoph Hellwig
  2011-08-30  7:20       ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2011-08-30  6:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Tue, Aug 30, 2011 at 04:24:16PM +1000, Dave Chinner wrote:
> >  xfs_mark_inode_dirty_sync(
> > @@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync(
> >  
> >  	if (!(inode->i_state & (I_WILL_FREE|I_FREEING)))
> >  		mark_inode_dirty_sync(inode);
> > +	else {
> > +		barrier();
> > +		ip->i_update_core = 1;
> > +	}
> >  }
> 
> Why the barrier()? Isn't that just a compiler barrier? If you are
> worried about catching the update vs clearing it in transaction
> commit, shouldn't that use smp_mb() instead (in both places)?

It's a blind copy & past from xfs_fs_dirty_inode.  The comments
there suggests it is for update ordering.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount
  2011-08-30  6:39     ` Christoph Hellwig
@ 2011-08-30  7:20       ` Dave Chinner
  2011-08-30  7:27         ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2011-08-30  7:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Aug 30, 2011 at 02:39:49AM -0400, Christoph Hellwig wrote:
> On Tue, Aug 30, 2011 at 04:24:16PM +1000, Dave Chinner wrote:
> > >  xfs_mark_inode_dirty_sync(
> > > @@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync(
> > >  
> > >  	if (!(inode->i_state & (I_WILL_FREE|I_FREEING)))
> > >  		mark_inode_dirty_sync(inode);
> > > +	else {
> > > +		barrier();
> > > +		ip->i_update_core = 1;
> > > +	}
> > >  }
> > 
> > Why the barrier()? Isn't that just a compiler barrier? If you are
> > worried about catching the update vs clearing it in transaction
> > commit, shouldn't that use smp_mb() instead (in both places)?
> 
> It's a blind copy & past from xfs_fs_dirty_inode.  The comments
> there suggests it is for update ordering.

Right, I've always wondered about that, because the corresponding
code talks about requiring strongly ordered memory semantics and
that the compiler does this via SYNCHRONIZE() (barrier).

>From xfs_inode_item_format:

         * We clear i_update_core before copying out the data.
         * This is for coordination with our timestamp updates
         * that don't hold the inode lock. They will always
         * update the timestamps BEFORE setting i_update_core,
         * so if we clear i_update_core after they set it we
         * are guaranteed to see their updates to the timestamps
         * either here.  Likewise, if they set it after we clear it
         * here, we'll see it either on the next commit of this
         * inode or the next time the inode gets flushed via
         * xfs_iflush().  This depends on strongly ordered memory
         * semantics, but we have that.  We use the SYNCHRONIZE
         * macro to make sure that the compiler does not reorder
         * the i_update_core access below the data copy below.
         */
        if (ip->i_update_core)  {
                ip->i_update_core = 0;
                SYNCHRONIZE();
        }

Now that may have been true on Irix/MIPS which had strong memory
ordering so only compiler barriers were necessary.

However, normally when we talk about ordered memory semantics in
Linux, we cannot assume strong ordering - if we have ordering
requirements, we have to guarantee ordering by explicit use of
memory barriers, right?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount
  2011-08-30  7:20       ` Dave Chinner
@ 2011-08-30  7:27         ` Christoph Hellwig
  2011-08-31 22:51           ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2011-08-30  7:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Tue, Aug 30, 2011 at 05:20:13PM +1000, Dave Chinner wrote:
> Now that may have been true on Irix/MIPS which had strong memory
> ordering so only compiler barriers were necessary.
> 
> However, normally when we talk about ordered memory semantics in
> Linux, we cannot assume strong ordering - if we have ordering
> requirements, we have to guarantee ordering by explicit use of
> memory barriers, right?

Probably.  But I'm not worried about that so much, it's just timestamps
we're talking about as the size already has the ilock unlock as full
barrier, and we're going to kill this code soon anyway.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount
  2011-08-30  7:27         ` Christoph Hellwig
@ 2011-08-31 22:51           ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2011-08-31 22:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Aug 30, 2011 at 03:27:21AM -0400, Christoph Hellwig wrote:
> On Tue, Aug 30, 2011 at 05:20:13PM +1000, Dave Chinner wrote:
> > Now that may have been true on Irix/MIPS which had strong memory
> > ordering so only compiler barriers were necessary.
> > 
> > However, normally when we talk about ordered memory semantics in
> > Linux, we cannot assume strong ordering - if we have ordering
> > requirements, we have to guarantee ordering by explicit use of
> > memory barriers, right?
> 
> Probably.  But I'm not worried about that so much, it's just timestamps
> we're talking about as the size already has the ilock unlock as full
> barrier, and we're going to kill this code soon anyway.

Fair enough.

Consider it:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] xfs: fix ->write_inode return values
  2011-08-27  5:57 [PATCH 0/2] make sure to always update the inode size on umount Christoph Hellwig
  2011-08-27  5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig
@ 2011-08-27  5:57 ` Christoph Hellwig
  2011-08-30  6:25   ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2011-08-27  5:57 UTC (permalink / raw)
  To: xfs

Currently we always redirty an inode that was attempted to be written out
synchronously but has been cleaned by an AIL pushed internall, which is
rather bogus.  Fix that by doing the i_update_core check early on and
return 0 for it.  Also include async calls for it, as doing any work for
those is just as pointless.  While we're at it also fix the sign for the
EIO return in case of a filesystem shutdown, and fix the completely
non-sensical locking around xfs_log_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/xfs_super.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_super.c	2011-08-26 14:47:54.320317391 +0200
+++ linux-2.6/fs/xfs/xfs_super.c	2011-08-26 14:51:46.095728421 +0200
@@ -877,33 +877,17 @@ xfs_log_inode(
 	struct xfs_trans	*tp;
 	int			error;
 
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 	tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
 	error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
-
 	if (error) {
 		xfs_trans_cancel(tp, 0);
-		/* we need to return with the lock hold shared */
-		xfs_ilock(ip, XFS_ILOCK_SHARED);
 		return error;
 	}
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-
-	/*
-	 * Note - it's possible that we might have pushed ourselves out of the
-	 * way during trans_reserve which would flush the inode.  But there's
-	 * no guarantee that the inode buffer has actually gone out yet (it's
-	 * delwri).  Plus the buffer could be pinned anyway if it's part of
-	 * an inode in another recent transaction.  So we play it safe and
-	 * fire off the transaction anyway.
-	 */
-	xfs_trans_ijoin(tp, ip);
+	xfs_trans_ijoin_ref(tp, ip, XFS_ILOCK_EXCL);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-	error = xfs_trans_commit(tp, 0);
-	xfs_ilock_demote(ip, XFS_ILOCK_EXCL);
-
-	return error;
+	return xfs_trans_commit(tp, 0);
 }
 
 STATIC int
@@ -918,7 +902,9 @@ xfs_fs_write_inode(
 	trace_xfs_write_inode(ip);
 
 	if (XFS_FORCED_SHUTDOWN(mp))
-		return XFS_ERROR(EIO);
+		return -XFS_ERROR(EIO);
+	if (!ip->i_update_core)
+		return 0;
 
 	if (wbc->sync_mode == WB_SYNC_ALL) {
 		/*
@@ -929,12 +915,10 @@ xfs_fs_write_inode(
 		 * of synchronous log foces dramatically.
 		 */
 		xfs_ioend_wait(ip);
-		xfs_ilock(ip, XFS_ILOCK_SHARED);
-		if (ip->i_update_core) {
-			error = xfs_log_inode(ip);
-			if (error)
-				goto out_unlock;
-		}
+		error = xfs_log_inode(ip);
+		if (error)
+			goto out;
+		return 0;
 	} else {
 		/*
 		 * We make this non-blocking if the inode is contended, return

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] xfs: fix ->write_inode return values
  2011-08-27  5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig
@ 2011-08-30  6:25   ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2011-08-30  6:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sat, Aug 27, 2011 at 01:57:55AM -0400, Christoph Hellwig wrote:
> Currently we always redirty an inode that was attempted to be written out
> synchronously but has been cleaned by an AIL pushed internall, which is
> rather bogus.  Fix that by doing the i_update_core check early on and
> return 0 for it.  Also include async calls for it, as doing any work for
> those is just as pointless.  While we're at it also fix the sign for the
> EIO return in case of a filesystem shutdown, and fix the completely
> non-sensical locking around xfs_log_inode.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Makes sense.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-08-31 22:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-27  5:57 [PATCH 0/2] make sure to always update the inode size on umount Christoph Hellwig
2011-08-27  5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig
2011-08-30  6:24   ` Dave Chinner
2011-08-30  6:39     ` Christoph Hellwig
2011-08-30  7:20       ` Dave Chinner
2011-08-30  7:27         ` Christoph Hellwig
2011-08-31 22:51           ` Dave Chinner
2011-08-27  5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig
2011-08-30  6:25   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox