* [PATCH 0/2] xfs: extent swap fixes @ 2014-07-31 6:12 Dave Chinner 2014-07-31 6:12 ` [PATCH 1/2] xfs: fix swapext ilock deadlock Dave Chinner 2014-07-31 6:12 ` [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents Dave Chinner 0 siblings, 2 replies; 11+ messages in thread From: Dave Chinner @ 2014-07-31 6:12 UTC (permalink / raw) To: xfs Hi folks, The extent swap code is not symmetrical, so correct behaviour is dependent on userspace doing the right thing. If we try to swap extents with the temporary inode in the wrong state we can have interesting failures. These two patches fix those issues and treat the two files identically in terms of the state of the data contained in the files being swapped. Failure to write or remove all cached data from either file will now cause the swap extents operation to fail. Comments, thoughts? Cheers, Dave. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] xfs: fix swapext ilock deadlock 2014-07-31 6:12 [PATCH 0/2] xfs: extent swap fixes Dave Chinner @ 2014-07-31 6:12 ` Dave Chinner 2014-07-31 17:15 ` Christoph Hellwig 2014-07-31 6:12 ` [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents Dave Chinner 1 sibling, 1 reply; 11+ messages in thread From: Dave Chinner @ 2014-07-31 6:12 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> xfs_swap_extents() holds the ilock over a call to filemap_write_and_wait(), which can then try to write data and take the ilock. That causes a self-deadlock. Fix the deadlock and clean up the code by separating the locking appropriately. Add a lockflags variable to track what locks we are holding as we gain and drop them and cleanup the error handling to always use "out_unlock" with the lockflags variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_bmap_util.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index bbc748a..3c60c43 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1633,6 +1633,7 @@ xfs_swap_extents( int aforkblks = 0; int taforkblks = 0; __uint64_t tmp; + int lock_flags; tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL); if (!tempifp) { @@ -1641,13 +1642,13 @@ xfs_swap_extents( } /* - * we have to do two separate lock calls here to keep lockdep - * happy. If we try to get all the locks in one call, lock will - * report false positives when we drop the ILOCK and regain them - * below. + * Lock up the inodes against other IO and truncate to begin with. + * Then we can ensure the inodes are flushed and have no page cache + * safely. Once we have done this we can take the ilocks and do the rest + * of the checks. */ + lock_flags = XFS_IOLOCK_EXCL; xfs_lock_two_inodes(ip, tip, XFS_IOLOCK_EXCL); - xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); /* Verify that both files have the same format */ if ((ip->i_d.di_mode & S_IFMT) != (tip->i_d.di_mode & S_IFMT)) { @@ -1666,6 +1667,9 @@ xfs_swap_extents( goto out_unlock; truncate_pagecache_range(VFS_I(tip), 0, -1); + xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); + lock_flags |= XFS_ILOCK_EXCL; + /* Verify O_DIRECT for ftmp */ if (VFS_I(tip)->i_mapping->nrpages) { error = -EINVAL; @@ -1720,6 +1724,7 @@ xfs_swap_extents( xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_iunlock(tip, XFS_ILOCK_EXCL); + lock_flags &= ~XFS_ILOCK_EXCL; /* * There is a race condition here since we gave up the @@ -1732,13 +1737,11 @@ xfs_swap_extents( tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT); error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); - if (error) { - xfs_iunlock(ip, XFS_IOLOCK_EXCL); - xfs_iunlock(tip, XFS_IOLOCK_EXCL); - xfs_trans_cancel(tp, 0); - goto out; - } + if (error) + goto out_trans_cancel; + xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); + lock_flags |= XFS_ILOCK_EXCL; /* * Count the number of extended attribute blocks @@ -1757,8 +1760,8 @@ xfs_swap_extents( goto out_trans_cancel; } - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); - xfs_trans_ijoin(tp, tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_trans_ijoin(tp, ip, lock_flags); + xfs_trans_ijoin(tp, tip, lock_flags); /* * Before we've swapped the forks, lets set the owners of the forks @@ -1887,8 +1890,8 @@ out: return error; out_unlock: - xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); - xfs_iunlock(tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_iunlock(ip, lock_flags); + xfs_iunlock(tip, lock_flags); goto out; out_trans_cancel: -- 2.0.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xfs: fix swapext ilock deadlock 2014-07-31 6:12 ` [PATCH 1/2] xfs: fix swapext ilock deadlock Dave Chinner @ 2014-07-31 17:15 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2014-07-31 17:15 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Thu, Jul 31, 2014 at 04:12:07PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > xfs_swap_extents() holds the ilock over a call to > filemap_write_and_wait(), which can then try to write data and take > the ilock. That causes a self-deadlock. > > Fix the deadlock and clean up the code by separating the locking > appropriately. Add a lockflags variable to track what locks we are > holding as we gain and drop them and cleanup the error handling to > always use "out_unlock" with the lockflags variable. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents 2014-07-31 6:12 [PATCH 0/2] xfs: extent swap fixes Dave Chinner 2014-07-31 6:12 ` [PATCH 1/2] xfs: fix swapext ilock deadlock Dave Chinner @ 2014-07-31 6:12 ` Dave Chinner 2014-07-31 17:16 ` Christoph Hellwig 2014-08-01 12:44 ` Brian Foster 1 sibling, 2 replies; 11+ messages in thread From: Dave Chinner @ 2014-07-31 6:12 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> We need to treat both inodes identically from a page cache point of view when prepareing them for extent swapping. We don't do this right now - we assume that one of the inodes empty, because that's what xfs_fsr currently does. Remove this assumption from the code. While factoring out the flushing and related checks, move the transactions reservation to immeidately after the flushes so that we don't need to pick up and then drop the ilock to do the transaction reservation. There are no issues with aborting the transaction it if the checks fail before we join the inodes to the transaction and dirty them, so this is a safe change to make. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_bmap_util.c | 81 +++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 44 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 3c60c43..2f1e30d 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1619,6 +1619,30 @@ xfs_swap_extents_check_format( } int +xfs_swap_extent_flush( + struct xfs_inode *ip) +{ + int error; + + error = filemap_write_and_wait(VFS_I(ip)->i_mapping); + if (error) + return error; + truncate_pagecache_range(VFS_I(ip), 0, -1); + + /* Verify O_DIRECT for ftmp */ + if (VFS_I(ip)->i_mapping->nrpages) + return -EINVAL; + + /* + * Don't try to swap extents on mmap()d files because we can't lock + * out races against page faults safely. + */ + if (mapping_mapped(VFS_I(ip)->i_mapping)) + return -EBUSY; + return 0; +} + +int xfs_swap_extents( xfs_inode_t *ip, /* target inode */ xfs_inode_t *tip, /* tmp inode */ @@ -1662,26 +1686,28 @@ xfs_swap_extents( goto out_unlock; } - error = filemap_write_and_wait(VFS_I(tip)->i_mapping); + error = xfs_swap_extent_flush(ip); + if (error) + goto out_unlock; + error = xfs_swap_extent_flush(tip); if (error) goto out_unlock; - truncate_pagecache_range(VFS_I(tip), 0, -1); - - xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); - lock_flags |= XFS_ILOCK_EXCL; - /* Verify O_DIRECT for ftmp */ - if (VFS_I(tip)->i_mapping->nrpages) { - error = -EINVAL; + tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT); + error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); + if (error) { + xfs_trans_cancel(tp, 0); goto out_unlock; } + xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); + lock_flags |= XFS_ILOCK_EXCL; /* Verify all data are being swapped */ if (sxp->sx_offset != 0 || sxp->sx_length != ip->i_d.di_size || sxp->sx_length != tip->i_d.di_size) { error = -EFAULT; - goto out_unlock; + goto out_trans_cancel; } trace_xfs_swap_extent_before(ip, 0); @@ -1693,7 +1719,7 @@ xfs_swap_extents( xfs_notice(mp, "%s: inode 0x%llx format is incompatible for exchanging.", __func__, ip->i_ino); - goto out_unlock; + goto out_trans_cancel; } /* @@ -1708,41 +1734,8 @@ xfs_swap_extents( (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) || (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) { error = -EBUSY; - goto out_unlock; - } - - /* We need to fail if the file is memory mapped. Once we have tossed - * all existing pages, the page fault will have no option - * but to go to the filesystem for pages. By making the page fault call - * vop_read (or write in the case of autogrow) they block on the iolock - * until we have switched the extents. - */ - if (mapping_mapped(VFS_I(ip)->i_mapping)) { - error = -EBUSY; - goto out_unlock; - } - - xfs_iunlock(ip, XFS_ILOCK_EXCL); - xfs_iunlock(tip, XFS_ILOCK_EXCL); - lock_flags &= ~XFS_ILOCK_EXCL; - - /* - * There is a race condition here since we gave up the - * ilock. However, the data fork will not change since - * we have the iolock (locked for truncation too) so we - * are safe. We don't really care if non-io related - * fields change. - */ - truncate_pagecache_range(VFS_I(ip), 0, -1); - - tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT); - error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); - if (error) goto out_trans_cancel; - - xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); - lock_flags |= XFS_ILOCK_EXCL; - + } /* * Count the number of extended attribute blocks */ -- 2.0.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents 2014-07-31 6:12 ` [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents Dave Chinner @ 2014-07-31 17:16 ` Christoph Hellwig 2014-07-31 23:02 ` Dave Chinner 2014-08-01 12:44 ` Brian Foster 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2014-07-31 17:16 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs > int > +xfs_swap_extent_flush( > + struct xfs_inode *ip) > +{ > + int error; nipick: shouldn't the arguments and local variables align to the same level? Also a local struct inode variable instead of using VFS_I(ip) 4 times would be nice. Otherwise looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents 2014-07-31 17:16 ` Christoph Hellwig @ 2014-07-31 23:02 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2014-07-31 23:02 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On Thu, Jul 31, 2014 at 10:16:02AM -0700, Christoph Hellwig wrote: > > int > > +xfs_swap_extent_flush( > > + struct xfs_inode *ip) > > +{ > > + int error; > > nipick: shouldn't the arguments and local variables align to the same > level? *nod* > Also a local struct inode variable instead of using VFS_I(ip) 4 times > would be nice. Will fix. > Otherwise looks good, > > Reviewed-by: Christoph Hellwig <hch@lst.de> Thanks. -Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents 2014-07-31 6:12 ` [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents Dave Chinner 2014-07-31 17:16 ` Christoph Hellwig @ 2014-08-01 12:44 ` Brian Foster 2014-08-02 3:19 ` Dave Chinner 1 sibling, 1 reply; 11+ messages in thread From: Brian Foster @ 2014-08-01 12:44 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Thu, Jul 31, 2014 at 04:12:08PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > We need to treat both inodes identically from a page cache point of > view when prepareing them for extent swapping. We don't do this > right now - we assume that one of the inodes empty, because that's > what xfs_fsr currently does. Remove this assumption from the code. > > While factoring out the flushing and related checks, move the > transactions reservation to immeidately after the flushes so that we > don't need to pick up and then drop the ilock to do the transaction > reservation. There are no issues with aborting the transaction it if > the checks fail before we join the inodes to the transaction and > dirty them, so this is a safe change to make. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- Both of these looked fine to me, but I couldn't apply this one to for-next or master... Brian > fs/xfs/xfs_bmap_util.c | 81 +++++++++++++++++++++++--------------------------- > 1 file changed, 37 insertions(+), 44 deletions(-) > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > index 3c60c43..2f1e30d 100644 > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -1619,6 +1619,30 @@ xfs_swap_extents_check_format( > } > > int > +xfs_swap_extent_flush( > + struct xfs_inode *ip) > +{ > + int error; > + > + error = filemap_write_and_wait(VFS_I(ip)->i_mapping); > + if (error) > + return error; > + truncate_pagecache_range(VFS_I(ip), 0, -1); > + > + /* Verify O_DIRECT for ftmp */ > + if (VFS_I(ip)->i_mapping->nrpages) > + return -EINVAL; > + > + /* > + * Don't try to swap extents on mmap()d files because we can't lock > + * out races against page faults safely. > + */ > + if (mapping_mapped(VFS_I(ip)->i_mapping)) > + return -EBUSY; > + return 0; > +} > + > +int > xfs_swap_extents( > xfs_inode_t *ip, /* target inode */ > xfs_inode_t *tip, /* tmp inode */ > @@ -1662,26 +1686,28 @@ xfs_swap_extents( > goto out_unlock; > } > > - error = filemap_write_and_wait(VFS_I(tip)->i_mapping); > + error = xfs_swap_extent_flush(ip); > + if (error) > + goto out_unlock; > + error = xfs_swap_extent_flush(tip); > if (error) > goto out_unlock; > - truncate_pagecache_range(VFS_I(tip), 0, -1); > - > - xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); > - lock_flags |= XFS_ILOCK_EXCL; > > - /* Verify O_DIRECT for ftmp */ > - if (VFS_I(tip)->i_mapping->nrpages) { > - error = -EINVAL; > + tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT); > + error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); > + if (error) { > + xfs_trans_cancel(tp, 0); > goto out_unlock; > } > + xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); > + lock_flags |= XFS_ILOCK_EXCL; > > /* Verify all data are being swapped */ > if (sxp->sx_offset != 0 || > sxp->sx_length != ip->i_d.di_size || > sxp->sx_length != tip->i_d.di_size) { > error = -EFAULT; > - goto out_unlock; > + goto out_trans_cancel; > } > > trace_xfs_swap_extent_before(ip, 0); > @@ -1693,7 +1719,7 @@ xfs_swap_extents( > xfs_notice(mp, > "%s: inode 0x%llx format is incompatible for exchanging.", > __func__, ip->i_ino); > - goto out_unlock; > + goto out_trans_cancel; > } > > /* > @@ -1708,41 +1734,8 @@ xfs_swap_extents( > (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) || > (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) { > error = -EBUSY; > - goto out_unlock; > - } > - > - /* We need to fail if the file is memory mapped. Once we have tossed > - * all existing pages, the page fault will have no option > - * but to go to the filesystem for pages. By making the page fault call > - * vop_read (or write in the case of autogrow) they block on the iolock > - * until we have switched the extents. > - */ > - if (mapping_mapped(VFS_I(ip)->i_mapping)) { > - error = -EBUSY; > - goto out_unlock; > - } > - > - xfs_iunlock(ip, XFS_ILOCK_EXCL); > - xfs_iunlock(tip, XFS_ILOCK_EXCL); > - lock_flags &= ~XFS_ILOCK_EXCL; > - > - /* > - * There is a race condition here since we gave up the > - * ilock. However, the data fork will not change since > - * we have the iolock (locked for truncation too) so we > - * are safe. We don't really care if non-io related > - * fields change. > - */ > - truncate_pagecache_range(VFS_I(ip), 0, -1); > - > - tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT); > - error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); > - if (error) > goto out_trans_cancel; > - > - xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); > - lock_flags |= XFS_ILOCK_EXCL; > - > + } > /* > * Count the number of extended attribute blocks > */ > -- > 2.0.0 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents 2014-08-01 12:44 ` Brian Foster @ 2014-08-02 3:19 ` Dave Chinner 2014-08-02 11:24 ` Brian Foster 0 siblings, 1 reply; 11+ messages in thread From: Dave Chinner @ 2014-08-02 3:19 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Fri, Aug 01, 2014 at 08:44:02AM -0400, Brian Foster wrote: > On Thu, Jul 31, 2014 at 04:12:08PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > We need to treat both inodes identically from a page cache point of > > view when prepareing them for extent swapping. We don't do this > > right now - we assume that one of the inodes empty, because that's > > what xfs_fsr currently does. Remove this assumption from the code. > > > > While factoring out the flushing and related checks, move the > > transactions reservation to immeidately after the flushes so that we > > don't need to pick up and then drop the ilock to do the transaction > > reservation. There are no issues with aborting the transaction it if > > the checks fail before we join the inodes to the transaction and > > dirty them, so this is a safe change to make. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > Both of these looked fine to me, but I couldn't apply this one to > for-next or master... It's actually in my working branch, which means it's based on 3.16-rc5 + random-outside-xfs-patches + for-next + verifier fixes + sb discombobulation and then this patch set. I didn't check that it applied directly against for-next - do you want me to rebase and resend it? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents 2014-08-02 3:19 ` Dave Chinner @ 2014-08-02 11:24 ` Brian Foster 0 siblings, 0 replies; 11+ messages in thread From: Brian Foster @ 2014-08-02 11:24 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Sat, Aug 02, 2014 at 01:19:31PM +1000, Dave Chinner wrote: > On Fri, Aug 01, 2014 at 08:44:02AM -0400, Brian Foster wrote: > > On Thu, Jul 31, 2014 at 04:12:08PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > We need to treat both inodes identically from a page cache point of > > > view when prepareing them for extent swapping. We don't do this > > > right now - we assume that one of the inodes empty, because that's > > > what xfs_fsr currently does. Remove this assumption from the code. > > > > > > While factoring out the flushing and related checks, move the > > > transactions reservation to immeidately after the flushes so that we > > > don't need to pick up and then drop the ilock to do the transaction > > > reservation. There are no issues with aborting the transaction it if > > > the checks fail before we join the inodes to the transaction and > > > dirty them, so this is a safe change to make. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > --- > > > > Both of these looked fine to me, but I couldn't apply this one to > > for-next or master... > > It's actually in my working branch, which means it's based on > 3.16-rc5 + random-outside-xfs-patches + for-next + verifier fixes + > sb discombobulation and then this patch set. I didn't check that it > applied directly against for-next - do you want me to rebase and > resend it? > Ok, no worries. I was just trying to compile test and verify that nothing was unexpected with the patch as posted. I'm good as long we get some lead team for testing and whatnot when this gets merged to a dev. branch. :) Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0/2] xfs: fix a couple of swap extent issues @ 2014-06-06 8:22 Dave Chinner 2014-06-06 8:22 ` [PATCH 1/2] xfs: fix swapext ilock deadlock Dave Chinner 0 siblings, 1 reply; 11+ messages in thread From: Dave Chinner @ 2014-06-06 8:22 UTC (permalink / raw) To: xfs Hi folks, Eric pointed out that this morning that if xfs_fsr didn't fsync the target file, the swap extents ioctl deadlocked. Dumb bug, simple fix. I also made sure that swap extent treats both inodes the same in terms of flushing, page cache pages and being mmap()d so that if anyone other than xfs_fsr uses it it doesn't go boom when the second inode is not a newly created empty file. Thoughts? -Dave. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] xfs: fix swapext ilock deadlock 2014-06-06 8:22 [PATCH 0/2] xfs: fix a couple of swap extent issues Dave Chinner @ 2014-06-06 8:22 ` Dave Chinner 2014-06-06 19:59 ` Brian Foster 0 siblings, 1 reply; 11+ messages in thread From: Dave Chinner @ 2014-06-06 8:22 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> xfs_swap_extents() holds the ilock over a call to filemap_write_and_wait(), which can then try to write data and take the ilock. That causes a self-deadlock. Fix the deadlock and clean up the code by separating the locking appropriately. Add a lockflags variable to track what locks we are holding as we gain and drop them and cleanup the error handling to always use "out_unlock" with the lockflags variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_bmap_util.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 703b3ec..948eba1 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1686,6 +1686,7 @@ xfs_swap_extents( int aforkblks = 0; int taforkblks = 0; __uint64_t tmp; + int lock_flags; tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL); if (!tempifp) { @@ -1694,13 +1695,13 @@ xfs_swap_extents( } /* - * we have to do two separate lock calls here to keep lockdep - * happy. If we try to get all the locks in one call, lock will - * report false positives when we drop the ILOCK and regain them - * below. + * Lock up the inodes against other IO and truncate to begin with. + * Then we can ensure the inodes are flushed and have no page cache + * safely. Once we have done this we can take the ilocks and do the rest + * of the checks. */ + lock_flags = XFS_IOLOCK_EXCL; xfs_lock_two_inodes(ip, tip, XFS_IOLOCK_EXCL); - xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); /* Verify that both files have the same format */ if ((ip->i_d.di_mode & S_IFMT) != (tip->i_d.di_mode & S_IFMT)) { @@ -1719,6 +1720,9 @@ xfs_swap_extents( goto out_unlock; truncate_pagecache_range(VFS_I(tip), 0, -1); + xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); + lock_flags |= XFS_ILOCK_EXCL; + /* Verify O_DIRECT for ftmp */ if (VN_CACHED(VFS_I(tip)) != 0) { error = XFS_ERROR(EINVAL); @@ -1773,6 +1777,7 @@ xfs_swap_extents( xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_iunlock(tip, XFS_ILOCK_EXCL); + lock_flags &= ~XFS_ILOCK_EXCL; /* * There is a race condition here since we gave up the @@ -1785,13 +1790,11 @@ xfs_swap_extents( tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT); error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); - if (error) { - xfs_iunlock(ip, XFS_IOLOCK_EXCL); - xfs_iunlock(tip, XFS_IOLOCK_EXCL); - xfs_trans_cancel(tp, 0); - goto out; - } + if (error) + goto out_trans_cancel; + xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); + lock_flags |= XFS_ILOCK_EXCL; /* * Count the number of extended attribute blocks @@ -1810,8 +1813,8 @@ xfs_swap_extents( goto out_trans_cancel; } - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); - xfs_trans_ijoin(tp, tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_trans_ijoin(tp, ip, lock_flags); + xfs_trans_ijoin(tp, tip, lock_flags); /* * Before we've swapped the forks, lets set the owners of the forks @@ -1940,8 +1943,8 @@ out: return error; out_unlock: - xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); - xfs_iunlock(tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); + xfs_iunlock(ip, lock_flags); + xfs_iunlock(tip, lock_flags); goto out; out_trans_cancel: -- 2.0.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xfs: fix swapext ilock deadlock 2014-06-06 8:22 ` [PATCH 1/2] xfs: fix swapext ilock deadlock Dave Chinner @ 2014-06-06 19:59 ` Brian Foster 0 siblings, 0 replies; 11+ messages in thread From: Brian Foster @ 2014-06-06 19:59 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Fri, Jun 06, 2014 at 06:22:52PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > xfs_swap_extents() holds the ilock over a call to > filemap_write_and_wait(), which can then try to write data and take > the ilock. That causes a self-deadlock. > > Fix the deadlock and clean up the code by separating the locking > appropriately. Add a lockflags variable to track what locks we are > holding as we gain and drop them and cleanup the error handling to > always use "out_unlock" with the lockflags variable. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_bmap_util.c | 33 ++++++++++++++++++--------------- > 1 file changed, 18 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > index 703b3ec..948eba1 100644 > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -1686,6 +1686,7 @@ xfs_swap_extents( > int aforkblks = 0; > int taforkblks = 0; > __uint64_t tmp; > + int lock_flags; > > tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL); > if (!tempifp) { > @@ -1694,13 +1695,13 @@ xfs_swap_extents( > } > > /* > - * we have to do two separate lock calls here to keep lockdep > - * happy. If we try to get all the locks in one call, lock will > - * report false positives when we drop the ILOCK and regain them > - * below. > + * Lock up the inodes against other IO and truncate to begin with. > + * Then we can ensure the inodes are flushed and have no page cache > + * safely. Once we have done this we can take the ilocks and do the rest > + * of the checks. > */ > + lock_flags = XFS_IOLOCK_EXCL; > xfs_lock_two_inodes(ip, tip, XFS_IOLOCK_EXCL); > - xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); > > /* Verify that both files have the same format */ > if ((ip->i_d.di_mode & S_IFMT) != (tip->i_d.di_mode & S_IFMT)) { Perhaps move these checks further down to where we eventually grab the ilock..? It's not clear to me if that is important. I was going to suggest to pull up the tp allocation as well to eliminate the ilock lock/unlock/lock cycle, but it looks like you've done that in patch 2. However... > @@ -1719,6 +1720,9 @@ xfs_swap_extents( > goto out_unlock; > truncate_pagecache_range(VFS_I(tip), 0, -1); > > + xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); > + lock_flags |= XFS_ILOCK_EXCL; > + > /* Verify O_DIRECT for ftmp */ > if (VN_CACHED(VFS_I(tip)) != 0) { > error = XFS_ERROR(EINVAL); > @@ -1773,6 +1777,7 @@ xfs_swap_extents( > > xfs_iunlock(ip, XFS_ILOCK_EXCL); > xfs_iunlock(tip, XFS_ILOCK_EXCL); > + lock_flags &= ~XFS_ILOCK_EXCL; > > /* > * There is a race condition here since we gave up the > @@ -1785,13 +1790,11 @@ xfs_swap_extents( > > tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT); > error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); > - if (error) { > - xfs_iunlock(ip, XFS_IOLOCK_EXCL); > - xfs_iunlock(tip, XFS_IOLOCK_EXCL); > - xfs_trans_cancel(tp, 0); > - goto out; > - } > + if (error) > + goto out_trans_cancel; > + > xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL); > + lock_flags |= XFS_ILOCK_EXCL; > > /* > * Count the number of extended attribute blocks > @@ -1810,8 +1813,8 @@ xfs_swap_extents( > goto out_trans_cancel; > } > > - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); > - xfs_trans_ijoin(tp, tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); > + xfs_trans_ijoin(tp, ip, lock_flags); > + xfs_trans_ijoin(tp, tip, lock_flags); > This still stays around in the middle of the function. It looks to me that the error handling is broken here irrespective of these changes. e.g., the xfs_trans_cancel() will unlock the inodes assuming the lock flags are transferred to the tp when the inodes are joined, yes? If so, out_trans_cancel will try to unlock the inodes after the cancel, even after the inodes have been joined. It might be good to join the inodes immediately after we acquire all the required locks (which is right after we take care of the transaction with both of these patches applied) and fix out_trans_cancel appropriately. Brian > /* > * Before we've swapped the forks, lets set the owners of the forks > @@ -1940,8 +1943,8 @@ out: > return error; > > out_unlock: > - xfs_iunlock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); > - xfs_iunlock(tip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL); > + xfs_iunlock(ip, lock_flags); > + xfs_iunlock(tip, lock_flags); > goto out; > > out_trans_cancel: > -- > 2.0.0 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-08-02 11:24 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-31 6:12 [PATCH 0/2] xfs: extent swap fixes Dave Chinner 2014-07-31 6:12 ` [PATCH 1/2] xfs: fix swapext ilock deadlock Dave Chinner 2014-07-31 17:15 ` Christoph Hellwig 2014-07-31 6:12 ` [PATCH 2/2] xfs: flush both inodes in xfs_swap_extents Dave Chinner 2014-07-31 17:16 ` Christoph Hellwig 2014-07-31 23:02 ` Dave Chinner 2014-08-01 12:44 ` Brian Foster 2014-08-02 3:19 ` Dave Chinner 2014-08-02 11:24 ` Brian Foster -- strict thread matches above, loose matches on Subject: below -- 2014-06-06 8:22 [PATCH 0/2] xfs: fix a couple of swap extent issues Dave Chinner 2014-06-06 8:22 ` [PATCH 1/2] xfs: fix swapext ilock deadlock Dave Chinner 2014-06-06 19:59 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox