* [PATCH 0/2] make sure to always update the inode size on umount @ 2011-08-27 5:57 Christoph Hellwig 2011-08-27 5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig 2011-08-27 5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig 0 siblings, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2011-08-27 5:57 UTC (permalink / raw) To: xfs This series addresses an issue where XFS might not actually flush the inode size if we wrote out the inode data during umount, which was recently reported on #xfs on irc. (If the reported is active on the list please chime in so that I can add your reported-by tag!). The first patch is the guts of the fix, and the second fixes various issues with ->write_inode that I found during the audit. To me patch 1 is a clear Linux 3.1 candidate, and I would tend to nominate patch 2 as well. I think we also have some issue with the VFS writeback code not handling inode redirtying from the I/O completion handler nicely, but I'll send another patch for that. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount 2011-08-27 5:57 [PATCH 0/2] make sure to always update the inode size on umount Christoph Hellwig @ 2011-08-27 5:57 ` Christoph Hellwig 2011-08-30 6:24 ` Dave Chinner 2011-08-27 5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig 1 sibling, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2011-08-27 5:57 UTC (permalink / raw) To: xfs During umount we do not add a dirty inode to the lru and wait for it to become clean first, but force writeback of data and metadata with I_WILL_FREE set. Currently there is no way for XFS to detect that the inode has been redirtied for metadata operations, as we skip the mark_inode_dirty call during teardown. Fix this by setting i_update_core nanually in that case, so that the inode gets flushed during inode reclaim. Alternatively we could enable calling mark_inode_dirty for inodes in I_WILL_FREE state, and let the VFS dirty tracking handle this. I decided against this as we will get better I/O patterns from reclaim compared to the synchronous writeout in write_inode_now, and always marking the inode dirty in some way from xfs_mark_inode_dirty is a better safetly net in either case. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/fs/xfs/xfs_iops.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_iops.c 2011-08-26 12:31:19.090631739 +0200 +++ linux-2.6/fs/xfs/xfs_iops.c 2011-08-26 12:35:43.692531800 +0200 @@ -70,9 +70,8 @@ xfs_synchronize_times( } /* - * If the linux inode is valid, mark it dirty. - * Used when committing a dirty inode into a transaction so that - * the inode will get written back by the linux code + * If the linux inode is valid, mark it dirty, else mark the dirty state + * in the XFS inode to make sure we pick it up when reclaiming the inode. */ void xfs_mark_inode_dirty_sync( @@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync( if (!(inode->i_state & (I_WILL_FREE|I_FREEING))) mark_inode_dirty_sync(inode); + else { + barrier(); + ip->i_update_core = 1; + } } void @@ -92,6 +95,11 @@ xfs_mark_inode_dirty( if (!(inode->i_state & (I_WILL_FREE|I_FREEING))) mark_inode_dirty(inode); + else { + barrier(); + ip->i_update_core = 1; + } + } /* _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount 2011-08-27 5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig @ 2011-08-30 6:24 ` Dave Chinner 2011-08-30 6:39 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2011-08-30 6:24 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On Sat, Aug 27, 2011 at 01:57:44AM -0400, Christoph Hellwig wrote: > During umount we do not add a dirty inode to the lru and wait for it to > become clean first, but force writeback of data and metadata with > I_WILL_FREE set. Currently there is no way for XFS to detect that the > inode has been redirtied for metadata operations, as we skip the > mark_inode_dirty call during teardown. Fix this by setting i_update_core > nanually in that case, so that the inode gets flushed during inode reclaim. > > Alternatively we could enable calling mark_inode_dirty for inodes in > I_WILL_FREE state, and let the VFS dirty tracking handle this. I decided > against this as we will get better I/O patterns from reclaim compared to > the synchronous writeout in write_inode_now, and always marking the inode > dirty in some way from xfs_mark_inode_dirty is a better safetly net in > either case. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > Index: linux-2.6/fs/xfs/xfs_iops.c > =================================================================== > --- linux-2.6.orig/fs/xfs/xfs_iops.c 2011-08-26 12:31:19.090631739 +0200 > +++ linux-2.6/fs/xfs/xfs_iops.c 2011-08-26 12:35:43.692531800 +0200 > @@ -70,9 +70,8 @@ xfs_synchronize_times( > } > > /* > - * If the linux inode is valid, mark it dirty. > - * Used when committing a dirty inode into a transaction so that > - * the inode will get written back by the linux code > + * If the linux inode is valid, mark it dirty, else mark the dirty state > + * in the XFS inode to make sure we pick it up when reclaiming the inode. > */ > void > xfs_mark_inode_dirty_sync( > @@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync( > > if (!(inode->i_state & (I_WILL_FREE|I_FREEING))) > mark_inode_dirty_sync(inode); > + else { > + barrier(); > + ip->i_update_core = 1; > + } > } Why the barrier()? Isn't that just a compiler barrier? If you are worried about catching the update vs clearing it in transaction commit, shouldn't that use smp_mb() instead (in both places)? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount 2011-08-30 6:24 ` Dave Chinner @ 2011-08-30 6:39 ` Christoph Hellwig 2011-08-30 7:20 ` Dave Chinner 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2011-08-30 6:39 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, xfs On Tue, Aug 30, 2011 at 04:24:16PM +1000, Dave Chinner wrote: > > xfs_mark_inode_dirty_sync( > > @@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync( > > > > if (!(inode->i_state & (I_WILL_FREE|I_FREEING))) > > mark_inode_dirty_sync(inode); > > + else { > > + barrier(); > > + ip->i_update_core = 1; > > + } > > } > > Why the barrier()? Isn't that just a compiler barrier? If you are > worried about catching the update vs clearing it in transaction > commit, shouldn't that use smp_mb() instead (in both places)? It's a blind copy & past from xfs_fs_dirty_inode. The comments there suggests it is for update ordering. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount 2011-08-30 6:39 ` Christoph Hellwig @ 2011-08-30 7:20 ` Dave Chinner 2011-08-30 7:27 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2011-08-30 7:20 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On Tue, Aug 30, 2011 at 02:39:49AM -0400, Christoph Hellwig wrote: > On Tue, Aug 30, 2011 at 04:24:16PM +1000, Dave Chinner wrote: > > > xfs_mark_inode_dirty_sync( > > > @@ -82,6 +81,10 @@ xfs_mark_inode_dirty_sync( > > > > > > if (!(inode->i_state & (I_WILL_FREE|I_FREEING))) > > > mark_inode_dirty_sync(inode); > > > + else { > > > + barrier(); > > > + ip->i_update_core = 1; > > > + } > > > } > > > > Why the barrier()? Isn't that just a compiler barrier? If you are > > worried about catching the update vs clearing it in transaction > > commit, shouldn't that use smp_mb() instead (in both places)? > > It's a blind copy & past from xfs_fs_dirty_inode. The comments > there suggests it is for update ordering. Right, I've always wondered about that, because the corresponding code talks about requiring strongly ordered memory semantics and that the compiler does this via SYNCHRONIZE() (barrier). >From xfs_inode_item_format: * We clear i_update_core before copying out the data. * This is for coordination with our timestamp updates * that don't hold the inode lock. They will always * update the timestamps BEFORE setting i_update_core, * so if we clear i_update_core after they set it we * are guaranteed to see their updates to the timestamps * either here. Likewise, if they set it after we clear it * here, we'll see it either on the next commit of this * inode or the next time the inode gets flushed via * xfs_iflush(). This depends on strongly ordered memory * semantics, but we have that. We use the SYNCHRONIZE * macro to make sure that the compiler does not reorder * the i_update_core access below the data copy below. */ if (ip->i_update_core) { ip->i_update_core = 0; SYNCHRONIZE(); } Now that may have been true on Irix/MIPS which had strong memory ordering so only compiler barriers were necessary. However, normally when we talk about ordered memory semantics in Linux, we cannot assume strong ordering - if we have ordering requirements, we have to guarantee ordering by explicit use of memory barriers, right? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount 2011-08-30 7:20 ` Dave Chinner @ 2011-08-30 7:27 ` Christoph Hellwig 2011-08-31 22:51 ` Dave Chinner 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2011-08-30 7:27 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, xfs On Tue, Aug 30, 2011 at 05:20:13PM +1000, Dave Chinner wrote: > Now that may have been true on Irix/MIPS which had strong memory > ordering so only compiler barriers were necessary. > > However, normally when we talk about ordered memory semantics in > Linux, we cannot assume strong ordering - if we have ordering > requirements, we have to guarantee ordering by explicit use of > memory barriers, right? Probably. But I'm not worried about that so much, it's just timestamps we're talking about as the size already has the ilock unlock as full barrier, and we're going to kill this code soon anyway. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount 2011-08-30 7:27 ` Christoph Hellwig @ 2011-08-31 22:51 ` Dave Chinner 0 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2011-08-31 22:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On Tue, Aug 30, 2011 at 03:27:21AM -0400, Christoph Hellwig wrote: > On Tue, Aug 30, 2011 at 05:20:13PM +1000, Dave Chinner wrote: > > Now that may have been true on Irix/MIPS which had strong memory > > ordering so only compiler barriers were necessary. > > > > However, normally when we talk about ordered memory semantics in > > Linux, we cannot assume strong ordering - if we have ordering > > requirements, we have to guarantee ordering by explicit use of > > memory barriers, right? > > Probably. But I'm not worried about that so much, it's just timestamps > we're talking about as the size already has the ilock unlock as full > barrier, and we're going to kill this code soon anyway. Fair enough. Consider it: Reviewed-by: Dave Chinner <dchinner@redhat.com> -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] xfs: fix ->write_inode return values 2011-08-27 5:57 [PATCH 0/2] make sure to always update the inode size on umount Christoph Hellwig 2011-08-27 5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig @ 2011-08-27 5:57 ` Christoph Hellwig 2011-08-30 6:25 ` Dave Chinner 1 sibling, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2011-08-27 5:57 UTC (permalink / raw) To: xfs Currently we always redirty an inode that was attempted to be written out synchronously but has been cleaned by an AIL pushed internall, which is rather bogus. Fix that by doing the i_update_core check early on and return 0 for it. Also include async calls for it, as doing any work for those is just as pointless. While we're at it also fix the sign for the EIO return in case of a filesystem shutdown, and fix the completely non-sensical locking around xfs_log_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/fs/xfs/xfs_super.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_super.c 2011-08-26 14:47:54.320317391 +0200 +++ linux-2.6/fs/xfs/xfs_super.c 2011-08-26 14:51:46.095728421 +0200 @@ -877,33 +877,17 @@ xfs_log_inode( struct xfs_trans *tp; int error; - xfs_iunlock(ip, XFS_ILOCK_SHARED); tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS); error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0); - if (error) { xfs_trans_cancel(tp, 0); - /* we need to return with the lock hold shared */ - xfs_ilock(ip, XFS_ILOCK_SHARED); return error; } xfs_ilock(ip, XFS_ILOCK_EXCL); - - /* - * Note - it's possible that we might have pushed ourselves out of the - * way during trans_reserve which would flush the inode. But there's - * no guarantee that the inode buffer has actually gone out yet (it's - * delwri). Plus the buffer could be pinned anyway if it's part of - * an inode in another recent transaction. So we play it safe and - * fire off the transaction anyway. - */ - xfs_trans_ijoin(tp, ip); + xfs_trans_ijoin_ref(tp, ip, XFS_ILOCK_EXCL); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - error = xfs_trans_commit(tp, 0); - xfs_ilock_demote(ip, XFS_ILOCK_EXCL); - - return error; + return xfs_trans_commit(tp, 0); } STATIC int @@ -918,7 +902,9 @@ xfs_fs_write_inode( trace_xfs_write_inode(ip); if (XFS_FORCED_SHUTDOWN(mp)) - return XFS_ERROR(EIO); + return -XFS_ERROR(EIO); + if (!ip->i_update_core) + return 0; if (wbc->sync_mode == WB_SYNC_ALL) { /* @@ -929,12 +915,10 @@ xfs_fs_write_inode( * of synchronous log foces dramatically. */ xfs_ioend_wait(ip); - xfs_ilock(ip, XFS_ILOCK_SHARED); - if (ip->i_update_core) { - error = xfs_log_inode(ip); - if (error) - goto out_unlock; - } + error = xfs_log_inode(ip); + if (error) + goto out; + return 0; } else { /* * We make this non-blocking if the inode is contended, return _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xfs: fix ->write_inode return values 2011-08-27 5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig @ 2011-08-30 6:25 ` Dave Chinner 0 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2011-08-30 6:25 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On Sat, Aug 27, 2011 at 01:57:55AM -0400, Christoph Hellwig wrote: > Currently we always redirty an inode that was attempted to be written out > synchronously but has been cleaned by an AIL pushed internall, which is > rather bogus. Fix that by doing the i_update_core check early on and > return 0 for it. Also include async calls for it, as doing any work for > those is just as pointless. While we're at it also fix the sign for the > EIO return in case of a filesystem shutdown, and fix the completely > non-sensical locking around xfs_log_inode. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Makes sense. Reviewed-by: Dave Chinner <dchinner@redhat.com> -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-08-31 22:51 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-27 5:57 [PATCH 0/2] make sure to always update the inode size on umount Christoph Hellwig 2011-08-27 5:57 ` [PATCH 1/2] xfs: fix xfs_mark_inode_dirty during umount Christoph Hellwig 2011-08-30 6:24 ` Dave Chinner 2011-08-30 6:39 ` Christoph Hellwig 2011-08-30 7:20 ` Dave Chinner 2011-08-30 7:27 ` Christoph Hellwig 2011-08-31 22:51 ` Dave Chinner 2011-08-27 5:57 ` [PATCH 2/2] xfs: fix ->write_inode return values Christoph Hellwig 2011-08-30 6:25 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox