[PATCH v2] xfs: fix unbalanced inode reclaim flush locking

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] xfs: fix unbalanced inode reclaim flush locking
@ 2016-10-17 18:07 Brian Foster
  2016-10-17 23:07 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2016-10-17 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: zlang

Filesystem shutdown testing on an older distro kernel has uncovered an
imbalanced locking pattern for the inode flush lock in
xfs_reclaim_inode(). Specifically, there is a double unlock sequence
between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
"reclaim:" label.

This actually does not cause obvious problems on current kernels due to
the current flush lock implementation. Older kernels use a counting
based flush lock mechanism, however, which effectively breaks the lock
indefinitely when an already unlocked flush lock is repeatedly unlocked.
Though this only currently occurs on filesystem shutdown, it has
reproduced the effect of elevating an fs shutdown to a system-wide crash
or hang.

Because this problem exists on filesystem shutdown and thus only after
unrelated catastrophic failure, issue the simple fix to reacquire the
flush lock in xfs_reclaim_inode() before jumping to the reclaim code.
Add an assert to xfs_ifunlock() to help prevent future occurrences of
the same problem. Finally, update a couple places to bitwise-OR the
reclaim flag to avoid smashing the flush lock in the process (which is
based on an inode flag in current kernels). This avoids a (spurious)
failure of the newly introduced xfs_ifunlock() assertion.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reported-by: Zorro Lang <zlang@redhat.com>
---

v2:
- Add comment in xfs_reclaim_inode() wrt to flush lock.
- Fix XFS_IRECLAIM usage in xfs_inode_free().

 fs/xfs/xfs_icache.c |  9 +++++++--
 fs/xfs/xfs_inode.h  | 11 ++++++-----
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 14796b7..2317b74 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -140,7 +140,7 @@ xfs_inode_free(
 	 * races.
 	 */
 	spin_lock(&ip->i_flags_lock);
-	ip->i_flags = XFS_IRECLAIM;
+	ip->i_flags |= XFS_IRECLAIM;
 	ip->i_ino = 0;
 	spin_unlock(&ip->i_flags_lock);

@@ -981,7 +981,12 @@ restart:

 	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 		xfs_iunpin_wait(ip);
+		/*
+		 * xfs_iflush_abort() drops the flush lock. Reacquire it as the
+		 * reclaim code expects to drop the flush lock.
+		 */
 		xfs_iflush_abort(ip, false);
+		xfs_iflock(ip);
 		goto reclaim;
 	}
 	if (xfs_ipincount(ip)) {
@@ -1044,7 +1049,7 @@ reclaim:
 	 * skip.
 	 */
 	spin_lock(&ip->i_flags_lock);
-	ip->i_flags = XFS_IRECLAIM;
+	ip->i_flags |= XFS_IRECLAIM;
 	ip->i_ino = 0;
 	spin_unlock(&ip->i_flags_lock);

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index f14c1de..71e8a81 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -246,6 +246,11 @@ static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
  * Synchronize processes attempting to flush the in-core inode back to disk.
  */

+static inline int xfs_isiflocked(struct xfs_inode *ip)
+{
+	return xfs_iflags_test(ip, XFS_IFLOCK);
+}
+
 extern void __xfs_iflock(struct xfs_inode *ip);

 static inline int xfs_iflock_nowait(struct xfs_inode *ip)
@@ -261,16 +266,12 @@ static inline void xfs_iflock(struct xfs_inode *ip)

 static inline void xfs_ifunlock(struct xfs_inode *ip)
 {
+	ASSERT(xfs_isiflocked(ip));
 	xfs_iflags_clear(ip, XFS_IFLOCK);
 	smp_mb();
 	wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
 }

-static inline int xfs_isiflocked(struct xfs_inode *ip)
-{
-	return xfs_iflags_test(ip, XFS_IFLOCK);
-}
-
 /*
  * Flags for inode locking.
  * Bit ranges:	1<<1  - 1<<16-1 -- iolock/ilock modes (bitfield)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfs: fix unbalanced inode reclaim flush locking
  2016-10-17 18:07 [PATCH v2] xfs: fix unbalanced inode reclaim flush locking Brian Foster
@ 2016-10-17 23:07 ` Dave Chinner
  2016-10-18 14:13   ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2016-10-17 23:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, zlang

On Mon, Oct 17, 2016 at 02:07:53PM -0400, Brian Foster wrote:
> Filesystem shutdown testing on an older distro kernel has uncovered an
> imbalanced locking pattern for the inode flush lock in
> xfs_reclaim_inode(). Specifically, there is a double unlock sequence
> between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
> "reclaim:" label.
> 
> This actually does not cause obvious problems on current kernels due to
> the current flush lock implementation. Older kernels use a counting
> based flush lock mechanism, however, which effectively breaks the lock
> indefinitely when an already unlocked flush lock is repeatedly unlocked.
> Though this only currently occurs on filesystem shutdown, it has
> reproduced the effect of elevating an fs shutdown to a system-wide crash
> or hang.
> 
> Because this problem exists on filesystem shutdown and thus only after
> unrelated catastrophic failure, issue the simple fix to reacquire the
> flush lock in xfs_reclaim_inode() before jumping to the reclaim code.
> Add an assert to xfs_ifunlock() to help prevent future occurrences of
> the same problem. Finally, update a couple places to bitwise-OR the
> reclaim flag to avoid smashing the flush lock in the process (which is
> based on an inode flag in current kernels). This avoids a (spurious)
> failure of the newly introduced xfs_ifunlock() assertion.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reported-by: Zorro Lang <zlang@redhat.com>
> ---
> 
> v2:
> - Add comment in xfs_reclaim_inode() wrt to flush lock.
> - Fix XFS_IRECLAIM usage in xfs_inode_free().
> 
>  fs/xfs/xfs_icache.c |  9 +++++++--
>  fs/xfs/xfs_inode.h  | 11 ++++++-----
>  2 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 14796b7..2317b74 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -140,7 +140,7 @@ xfs_inode_free(
>  	 * races.
>  	 */
>  	spin_lock(&ip->i_flags_lock);
> -	ip->i_flags = XFS_IRECLAIM;
> +	ip->i_flags |= XFS_IRECLAIM;
>  	ip->i_ino = 0;
>  	spin_unlock(&ip->i_flags_lock);
>  
> @@ -981,7 +981,12 @@ restart:
>  
>  	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
>  		xfs_iunpin_wait(ip);
> +		/*
> +		 * xfs_iflush_abort() drops the flush lock. Reacquire it as the
> +		 * reclaim code expects to drop the flush lock.
> +		 */
>  		xfs_iflush_abort(ip, false);
> +		xfs_iflock(ip);
>  		goto reclaim;
>  	}
>  	if (xfs_ipincount(ip)) {
> @@ -1044,7 +1049,7 @@ reclaim:
>  	 * skip.
>  	 */
>  	spin_lock(&ip->i_flags_lock);
> -	ip->i_flags = XFS_IRECLAIM;
> +	ip->i_flags |= XFS_IRECLAIM;
>  	ip->i_ino = 0;
>  	spin_unlock(&ip->i_flags_lock);

I'd prefer that we don't change this - an inode that has had it's
inode number cleared should also only have XFS_IRECLAIM set in it's
flags.

FWIW, there is nothing that should be waiting on the flush lock by
this point - we hold the inode locked exclusively, there are not
other references to the inode, and we've got a clean inode. As long
as we've cycled the flush lock inside the current XFS_ILOCK_EXCL
hold, then we can guarantee the inode is clean and nothing is
waiting on the flush lock as you have to hold the ILOCK before
grabbing the flush lock.

Hence it doesn't matter if we hold or don't hold the flush lock
at the point where we set ip->i_flags = XFS_IRECLAIM and ip->i_ino =
0 - clearing the bit drops the flush lock if we hold it, and
clearing ip->i_ino means that any lookups inside the RCU grace
period (either from xfs_iflush_cluster or cache lookups) means
they'll drop the inode without doing anything with it.

Hence it think we can simple do something like:

 	}
 
-	xfs_iflock(ip);
 reclaim:
 	/*
 	 * Because we use RCU freeing we need to ensure the inode always appears
 	 * to be reclaimed with an invalid inode number when in the free state.
 	 * We do this as early as possible under the ILOCK and flush lock so
 	 * that xfs_iflush_cluster() can be guaranteed to detect races with us
 	 * here. By doing this, we guarantee that once xfs_iflush_cluster has
 	 * locked both the XFS_ILOCK and the flush lock that it will see either
 	 * a valid, flushable inode that will serialise correctly against the
 	 * locks below, or it will see a clean (and invalid) inode that it can
 	 * skip.
+	 *
+	 * If we are currently holding the flush lock, then nothing else can
+	 * be waiting on it as we hold the XFS_ILOCK_EXCL. Resetting the i_flags
+	 * here effectively unlocks the flush lock if we currently hold it,
+	 * but it's a no-op if we don't hold it. This means we don't have to
+	 * add a lock cycle to the paths that drop the flush lock due to
+	 * IO completion or abort here.
 	 */
 	spin_lock(&ip->i_flags_lock);
 	ip->i_flags = XFS_IRECLAIM;
 	ip->i_ino = 0;
 	spin_unlock(&ip->i_flags_lock);
 
-	xfs_ifunlock(ip);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);

The rest of the code (adding the assert to xfs_ifunlock()) looks
fine, but this code here is quite special because of the way it
controls behaviour of the inode lookups within the RCU grace
period...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfs: fix unbalanced inode reclaim flush locking
  2016-10-17 23:07 ` Dave Chinner
@ 2016-10-18 14:13   ` Brian Foster
  2016-10-18 15:12     ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2016-10-18 14:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, zlang

On Tue, Oct 18, 2016 at 10:07:56AM +1100, Dave Chinner wrote:
> On Mon, Oct 17, 2016 at 02:07:53PM -0400, Brian Foster wrote:
> > Filesystem shutdown testing on an older distro kernel has uncovered an
> > imbalanced locking pattern for the inode flush lock in
> > xfs_reclaim_inode(). Specifically, there is a double unlock sequence
> > between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
> > "reclaim:" label.
> > 
> > This actually does not cause obvious problems on current kernels due to
> > the current flush lock implementation. Older kernels use a counting
> > based flush lock mechanism, however, which effectively breaks the lock
> > indefinitely when an already unlocked flush lock is repeatedly unlocked.
> > Though this only currently occurs on filesystem shutdown, it has
> > reproduced the effect of elevating an fs shutdown to a system-wide crash
> > or hang.
> > 
> > Because this problem exists on filesystem shutdown and thus only after
> > unrelated catastrophic failure, issue the simple fix to reacquire the
> > flush lock in xfs_reclaim_inode() before jumping to the reclaim code.
> > Add an assert to xfs_ifunlock() to help prevent future occurrences of
> > the same problem. Finally, update a couple places to bitwise-OR the
> > reclaim flag to avoid smashing the flush lock in the process (which is
> > based on an inode flag in current kernels). This avoids a (spurious)
> > failure of the newly introduced xfs_ifunlock() assertion.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > Reported-by: Zorro Lang <zlang@redhat.com>
> > ---
> > 
> > v2:
> > - Add comment in xfs_reclaim_inode() wrt to flush lock.
> > - Fix XFS_IRECLAIM usage in xfs_inode_free().
> > 
> >  fs/xfs/xfs_icache.c |  9 +++++++--
> >  fs/xfs/xfs_inode.h  | 11 ++++++-----
> >  2 files changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 14796b7..2317b74 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -140,7 +140,7 @@ xfs_inode_free(
> >  	 * races.
> >  	 */
> >  	spin_lock(&ip->i_flags_lock);
> > -	ip->i_flags = XFS_IRECLAIM;
> > +	ip->i_flags |= XFS_IRECLAIM;
> >  	ip->i_ino = 0;
> >  	spin_unlock(&ip->i_flags_lock);
> >  
> > @@ -981,7 +981,12 @@ restart:
> >  
> >  	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
> >  		xfs_iunpin_wait(ip);
> > +		/*
> > +		 * xfs_iflush_abort() drops the flush lock. Reacquire it as the
> > +		 * reclaim code expects to drop the flush lock.
> > +		 */
> >  		xfs_iflush_abort(ip, false);
> > +		xfs_iflock(ip);
> >  		goto reclaim;
> >  	}
> >  	if (xfs_ipincount(ip)) {
> > @@ -1044,7 +1049,7 @@ reclaim:
> >  	 * skip.
> >  	 */
> >  	spin_lock(&ip->i_flags_lock);
> > -	ip->i_flags = XFS_IRECLAIM;
> > +	ip->i_flags |= XFS_IRECLAIM;
> >  	ip->i_ino = 0;
> >  	spin_unlock(&ip->i_flags_lock);
> 
> I'd prefer that we don't change this - an inode that has had it's
> inode number cleared should also only have XFS_IRECLAIM set in it's
> flags.
> 

Ok, but IMO that means we should be doing something like this:

	ASSERT(ip->i_flags == 0);
	ip->i_flags = XFS_IRECLAIM;

> FWIW, there is nothing that should be waiting on the flush lock by
> this point - we hold the inode locked exclusively, there are not
> other references to the inode, and we've got a clean inode. As long
> as we've cycled the flush lock inside the current XFS_ILOCK_EXCL
> hold, then we can guarantee the inode is clean and nothing is
> waiting on the flush lock as you have to hold the ILOCK before
> grabbing the flush lock.
> 
> Hence it doesn't matter if we hold or don't hold the flush lock
> at the point where we set ip->i_flags = XFS_IRECLAIM and ip->i_ino =
> 0 - clearing the bit drops the flush lock if we hold it, and
> clearing ip->i_ino means that any lookups inside the RCU grace
> period (either from xfs_iflush_cluster or cache lookups) means
> they'll drop the inode without doing anything with it.
> 
> Hence it think we can simple do something like:
> 

Hmm, Ok. It does seem reasonable that we don't need the flush lock
across the reclaim bits.

>  	}
>  
> -	xfs_iflock(ip);
>  reclaim:
>  	/*
>  	 * Because we use RCU freeing we need to ensure the inode always appears
>  	 * to be reclaimed with an invalid inode number when in the free state.
>  	 * We do this as early as possible under the ILOCK and flush lock so
>  	 * that xfs_iflush_cluster() can be guaranteed to detect races with us
>  	 * here. By doing this, we guarantee that once xfs_iflush_cluster has
>  	 * locked both the XFS_ILOCK and the flush lock that it will see either
>  	 * a valid, flushable inode that will serialise correctly against the
>  	 * locks below, or it will see a clean (and invalid) inode that it can
>  	 * skip.
> +	 *

In that case, the above needs to be tweaked to drop the flush lock bits
as well.

> +	 * If we are currently holding the flush lock, then nothing else can
> +	 * be waiting on it as we hold the XFS_ILOCK_EXCL. Resetting the i_flags
> +	 * here effectively unlocks the flush lock if we currently hold it,
> +	 * but it's a no-op if we don't hold it. This means we don't have to
> +	 * add a lock cycle to the paths that drop the flush lock due to
> +	 * IO completion or abort here.
>  	 */

I really don't like the lazy overload of the i_flags assignment to
"maybe" unlock the flush lock. As proven by this patch, this has the
potential to hide bugs based on flag usage (whether associated with
flush locking or some unforeseen flags added down the road). It also
sets a landmine should we decide to change the flush lock mechanism
(which we've obviously done at least once before) to something that
doesn't rely on i_flags.

IOW, I'm simply saying that if the flush lock is not required here, then
there is really no correctness requirement to "potentially" release it
atomically (with respect to i_flags_lock) with the XFS_IRECLAIM
assignment. Instead, I'd prefer to avoid the trickiness and just drop it
in the one or two places earlier in the function that currently jump to
'reclaim:' with the flush lock held. 

>  	spin_lock(&ip->i_flags_lock);
>  	ip->i_flags = XFS_IRECLAIM;
>  	ip->i_ino = 0;
>  	spin_unlock(&ip->i_flags_lock);
>  
> -	xfs_ifunlock(ip);
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> 
> The rest of the code (adding the assert to xfs_ifunlock()) looks
> fine, but this code here is quite special because of the way it
> controls behaviour of the inode lookups within the RCU grace
> period...
> 

I'm assuming you'd prefer to leave the XFS_IRECLAIM assignment in
xfs_inode_free() as well (in which case, I'll move the xfs_isiflocked()
assert to before the assignment and probably add a similar i_flags == 0
assert)..?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfs: fix unbalanced inode reclaim flush locking
  2016-10-18 14:13   ` Brian Foster
@ 2016-10-18 15:12     ` Brian Foster
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2016-10-18 15:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, zlang

On Tue, Oct 18, 2016 at 10:13:01AM -0400, Brian Foster wrote:
> On Tue, Oct 18, 2016 at 10:07:56AM +1100, Dave Chinner wrote:
> > On Mon, Oct 17, 2016 at 02:07:53PM -0400, Brian Foster wrote:
> > > Filesystem shutdown testing on an older distro kernel has uncovered an
> > > imbalanced locking pattern for the inode flush lock in
> > > xfs_reclaim_inode(). Specifically, there is a double unlock sequence
> > > between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
> > > "reclaim:" label.
> > > 
> > > This actually does not cause obvious problems on current kernels due to
> > > the current flush lock implementation. Older kernels use a counting
> > > based flush lock mechanism, however, which effectively breaks the lock
> > > indefinitely when an already unlocked flush lock is repeatedly unlocked.
> > > Though this only currently occurs on filesystem shutdown, it has
> > > reproduced the effect of elevating an fs shutdown to a system-wide crash
> > > or hang.
> > > 
> > > Because this problem exists on filesystem shutdown and thus only after
> > > unrelated catastrophic failure, issue the simple fix to reacquire the
> > > flush lock in xfs_reclaim_inode() before jumping to the reclaim code.
> > > Add an assert to xfs_ifunlock() to help prevent future occurrences of
> > > the same problem. Finally, update a couple places to bitwise-OR the
> > > reclaim flag to avoid smashing the flush lock in the process (which is
> > > based on an inode flag in current kernels). This avoids a (spurious)
> > > failure of the newly introduced xfs_ifunlock() assertion.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > ---
> > > 
> > > v2:
> > > - Add comment in xfs_reclaim_inode() wrt to flush lock.
> > > - Fix XFS_IRECLAIM usage in xfs_inode_free().
> > > 
> > >  fs/xfs/xfs_icache.c |  9 +++++++--
> > >  fs/xfs/xfs_inode.h  | 11 ++++++-----
> > >  2 files changed, 13 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index 14796b7..2317b74 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > > @@ -140,7 +140,7 @@ xfs_inode_free(
> > >  	 * races.
> > >  	 */
> > >  	spin_lock(&ip->i_flags_lock);
> > > -	ip->i_flags = XFS_IRECLAIM;
> > > +	ip->i_flags |= XFS_IRECLAIM;
> > >  	ip->i_ino = 0;
> > >  	spin_unlock(&ip->i_flags_lock);
> > >  
> > > @@ -981,7 +981,12 @@ restart:
> > >  
> > >  	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
> > >  		xfs_iunpin_wait(ip);
> > > +		/*
> > > +		 * xfs_iflush_abort() drops the flush lock. Reacquire it as the
> > > +		 * reclaim code expects to drop the flush lock.
> > > +		 */
> > >  		xfs_iflush_abort(ip, false);
> > > +		xfs_iflock(ip);
> > >  		goto reclaim;
> > >  	}
> > >  	if (xfs_ipincount(ip)) {
> > > @@ -1044,7 +1049,7 @@ reclaim:
> > >  	 * skip.
> > >  	 */
> > >  	spin_lock(&ip->i_flags_lock);
> > > -	ip->i_flags = XFS_IRECLAIM;
> > > +	ip->i_flags |= XFS_IRECLAIM;
> > >  	ip->i_ino = 0;
> > >  	spin_unlock(&ip->i_flags_lock);
> > 
> > I'd prefer that we don't change this - an inode that has had it's
> > inode number cleared should also only have XFS_IRECLAIM set in it's
> > flags.
> > 
> 
> Ok, but IMO that means we should be doing something like this:
> 
> 	ASSERT(ip->i_flags == 0);
> 	ip->i_flags = XFS_IRECLAIM;
> 

Testing uncovered pretty quickly that I misread (i.e., reclaim should be
the only bit set after this point) and this assertion is bogus. In fact,
it looks like this could technically be something like the following:

	ASSERT(ip->i_flags & XFS_IRECLAIM);
	ip->i_flags &= XFS_IRECLAIM;

... because XFS_IRECLAIM is actually already set by
xfs_reclaim_inode_grab() for any inode that comes through here. But I'll
probably just tweak the assert to check !xfs_isiflocked(ip).

Brian

> > FWIW, there is nothing that should be waiting on the flush lock by
> > this point - we hold the inode locked exclusively, there are not
> > other references to the inode, and we've got a clean inode. As long
> > as we've cycled the flush lock inside the current XFS_ILOCK_EXCL
> > hold, then we can guarantee the inode is clean and nothing is
> > waiting on the flush lock as you have to hold the ILOCK before
> > grabbing the flush lock.
> > 
> > Hence it doesn't matter if we hold or don't hold the flush lock
> > at the point where we set ip->i_flags = XFS_IRECLAIM and ip->i_ino =
> > 0 - clearing the bit drops the flush lock if we hold it, and
> > clearing ip->i_ino means that any lookups inside the RCU grace
> > period (either from xfs_iflush_cluster or cache lookups) means
> > they'll drop the inode without doing anything with it.
> > 
> > Hence it think we can simple do something like:
> > 
> 
> Hmm, Ok. It does seem reasonable that we don't need the flush lock
> across the reclaim bits.
> 
> >  	}
> >  
> > -	xfs_iflock(ip);
> >  reclaim:
> >  	/*
> >  	 * Because we use RCU freeing we need to ensure the inode always appears
> >  	 * to be reclaimed with an invalid inode number when in the free state.
> >  	 * We do this as early as possible under the ILOCK and flush lock so
> >  	 * that xfs_iflush_cluster() can be guaranteed to detect races with us
> >  	 * here. By doing this, we guarantee that once xfs_iflush_cluster has
> >  	 * locked both the XFS_ILOCK and the flush lock that it will see either
> >  	 * a valid, flushable inode that will serialise correctly against the
> >  	 * locks below, or it will see a clean (and invalid) inode that it can
> >  	 * skip.
> > +	 *
> 
> In that case, the above needs to be tweaked to drop the flush lock bits
> as well.
> 
> > +	 * If we are currently holding the flush lock, then nothing else can
> > +	 * be waiting on it as we hold the XFS_ILOCK_EXCL. Resetting the i_flags
> > +	 * here effectively unlocks the flush lock if we currently hold it,
> > +	 * but it's a no-op if we don't hold it. This means we don't have to
> > +	 * add a lock cycle to the paths that drop the flush lock due to
> > +	 * IO completion or abort here.
> >  	 */
> 
> I really don't like the lazy overload of the i_flags assignment to
> "maybe" unlock the flush lock. As proven by this patch, this has the
> potential to hide bugs based on flag usage (whether associated with
> flush locking or some unforeseen flags added down the road). It also
> sets a landmine should we decide to change the flush lock mechanism
> (which we've obviously done at least once before) to something that
> doesn't rely on i_flags.
> 
> IOW, I'm simply saying that if the flush lock is not required here, then
> there is really no correctness requirement to "potentially" release it
> atomically (with respect to i_flags_lock) with the XFS_IRECLAIM
> assignment. Instead, I'd prefer to avoid the trickiness and just drop it
> in the one or two places earlier in the function that currently jump to
> 'reclaim:' with the flush lock held. 
> 
> >  	spin_lock(&ip->i_flags_lock);
> >  	ip->i_flags = XFS_IRECLAIM;
> >  	ip->i_ino = 0;
> >  	spin_unlock(&ip->i_flags_lock);
> >  
> > -	xfs_ifunlock(ip);
> >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > 
> > The rest of the code (adding the assert to xfs_ifunlock()) looks
> > fine, but this code here is quite special because of the way it
> > controls behaviour of the inode lookups within the RCU grace
> > period...
> > 
> 
> I'm assuming you'd prefer to leave the XFS_IRECLAIM assignment in
> xfs_inode_free() as well (in which case, I'll move the xfs_isiflocked()
> assert to before the assignment and probably add a similar i_flags == 0
> assert)..?
> 
> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-18 15:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-17 18:07 [PATCH v2] xfs: fix unbalanced inode reclaim flush locking Brian Foster
2016-10-17 23:07 ` Dave Chinner
2016-10-18 14:13   ` Brian Foster
2016-10-18 15:12     ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).