Re: [PATCH] xfs: i_state of inode is changed after the inode is freed

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed
       [not found] <20060710103740.B1674239@wobbly.melbourne.sgi.com>
@ 2006-07-14 10:25 ` Masayuki Saito
  2006-07-17 11:05   ` Nathan Scott
  0 siblings, 1 reply; 6+ messages in thread
From: Masayuki Saito @ 2006-07-14 10:25 UTC (permalink / raw)
  To: Nathan Scott; +Cc: David Chinner, xfs, linux-kernel

Hi, Nathan.

>I'll leave it to Dave to comment more later (he's travelling at the
>moment), since he's had his head deep in this area of the code most
>recently - but my first thoughts on your patch are that its solving
>the problem incorrectly.  We should not be in the destroy_inode code
>if the inode reference counting is correct everywhere - I would have
>expected the fix to be a get/put style change, rather than adding an
>inode lock and new lock/unlock semantics around an individual field;
>... and if that cannot be done to fix this (eh?), then some comments
>as to why refcounting didn't solve the problem here.

On the basis of the above, I consider the get/put style fix which use
i_count.

This problem is that i_state of the inode is changed while the inode
is freed in xfs filesystem.  And the cause is that the inode release
and xfs_iunpun() can run in parallel.

To fix this problem, I added a pair of igrab()/iput() before and behind
mark_inode_dirty_sync() at xfs_iunpin().  I think this can change it as
follows.

(1)The case that the inode release transaction runs after xfs_iunpin()
is called.
While mark_inode_dirty_sync() is running, igrab() promises that the
inode is alive.

(2)The case that xfs_iunpin() is called after iput() in the inode
release transaction is called(i_count is 0).
mark_inode_dirty_sync() is not called because the igrab() can not get
the inode.

I have made the following patch, but it is not yet tested.
I would like to hear your comment, first.

Signed-off-by: Masayuki Saito <m-saito@tnes.nec.co.jp>
---

--- linux-2.6.17.4/fs/xfs/xfs_inode.c.orig	2006-07-14 09:44:44.187844139 +0900
+++ linux-2.6.17.4/fs/xfs/xfs_inode.c	2006-07-14 09:58:26.398486290 +0900
@@ -2751,8 +2751,14 @@ xfs_iunpin(
 			if (vp) {
 				struct inode	*inode = vn_to_inode(vp);

-				if (!(inode->i_state & I_NEW))
-					mark_inode_dirty_sync(inode);
+				if (!(inode->i_state & 
+						(I_NEW|I_FREEING|I_CLEAR))) {
+					inode = igrab(inode);
+					if (inode != NULL) {
+						mark_inode_dirty_sync(inode);
+						iput(inode);
+					}
+				}
 			}
 		}
 		wake_up(&ip->i_ipin_wait);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed
  2006-07-14 10:25 ` [PATCH] xfs: i_state of inode is changed after the inode is freed Masayuki Saito
@ 2006-07-17 11:05   ` Nathan Scott
  2006-07-17 14:07     ` David Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Nathan Scott @ 2006-07-17 11:05 UTC (permalink / raw)
  To: Masayuki Saito; +Cc: David Chinner, xfs, linux-kernel

On Fri, Jul 14, 2006 at 07:25:20PM +0900, Masayuki Saito wrote:
> Hi, Nathan.
> 
> >I'll leave it to Dave to comment more later (he's travelling at the
> >moment), since he's had his head deep in this area of the code most
> >recently - but my first thoughts on your patch are that its solving
> >the problem incorrectly.  We should not be in the destroy_inode code
> >if the inode reference counting is correct everywhere - I would have
> >expected the fix to be a get/put style change, rather than adding an
> >inode lock and new lock/unlock semantics around an individual field;
> >... and if that cannot be done to fix this (eh?), then some comments
> >as to why refcounting didn't solve the problem here.
> 
> On the basis of the above, I consider the get/put style fix which use
> i_count.
> 
> This problem is that i_state of the inode is changed while the inode
> is freed in xfs filesystem.  And the cause is that the inode release
> and xfs_iunpun() can run in parallel.
> 
> To fix this problem, I added a pair of igrab()/iput() before and behind
> mark_inode_dirty_sync() at xfs_iunpin().  I think this can change it as
> follows.
> 
> (1)The case that the inode release transaction runs after xfs_iunpin()
> is called.
> While mark_inode_dirty_sync() is running, igrab() promises that the
> inode is alive.
> 
> (2)The case that xfs_iunpin() is called after iput() in the inode
> release transaction is called(i_count is 0).
> mark_inode_dirty_sync() is not called because the igrab() can not get
> the inode.
> 
> I have made the following patch, but it is not yet tested.
> I would like to hear your comment, first.

If this fixes your test case, then I like the look of it. ;-)

It does seem much simpler and less invasive than the earlier fix
using a spinlock.  I'll run with this in my testing for awhile,
let me know how your own testing goes too, please (I especially
would like to hear if it fixes that reproducible failure case).

thanks!

-- 
Nathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed
  2006-07-17 11:05   ` Nathan Scott
@ 2006-07-17 14:07     ` David Chinner
  2006-07-24  8:01       ` Masayuki Saito
  0 siblings, 1 reply; 6+ messages in thread
From: David Chinner @ 2006-07-17 14:07 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Masayuki Saito, David Chinner, xfs, linux-kernel

On Mon, Jul 17, 2006 at 09:05:01PM +1000, Nathan Scott wrote:
> On Fri, Jul 14, 2006 at 07:25:20PM +0900, Masayuki Saito wrote:
> > Hi, Nathan.
> > 
> > >I'll leave it to Dave to comment more later (he's travelling at the
> > >moment), since he's had his head deep in this area of the code most
> > >recently - but my first thoughts on your patch are that its solving
> > >the problem incorrectly.  We should not be in the destroy_inode code
> > >if the inode reference counting is correct everywhere - I would have
> > >expected the fix to be a get/put style change, rather than adding an
> > >inode lock and new lock/unlock semantics around an individual field;
> > >... and if that cannot be done to fix this (eh?), then some comments
> > >as to why refcounting didn't solve the problem here.

Looking at this a bit deeper, I think the reference counting is
correct and the problem is that we are not breaking the link between
the linux inode and the xfs inode atomically w.r.t xfs_iunpin()
usage....

> > On the basis of the above, I consider the get/put style fix which use
> > i_count.
> > 
> > This problem is that i_state of the inode is changed while the inode
> > is freed in xfs filesystem.  And the cause is that the inode release
> > and xfs_iunpun() can run in parallel.
> > 
> > To fix this problem, I added a pair of igrab()/iput() before and behind
> > mark_inode_dirty_sync() at xfs_iunpin().  I think this can change it as
> > follows.
> > 
> > (1)The case that the inode release transaction runs after xfs_iunpin()
> > is called.
> > While mark_inode_dirty_sync() is running, igrab() promises that the
> > inode is alive.
> > 
> > (2)The case that xfs_iunpin() is called after iput() in the inode
> > release transaction is called(i_count is 0).
> > mark_inode_dirty_sync() is not called because the igrab() can not get
> > the inode.
> > 
> > I have made the following patch, but it is not yet tested.
> > I would like to hear your comment, first.
> 
> If this fixes your test case, then I like the look of it. ;-)

I don't think it fixes the problem because igrab() fails to handle
the case we are hitting where I_CLEAR is set on the inode when we
mark it dirty. There's nothing in this patch preventing us from
sleeping after the !(I_NEW|I_FREEING|I_CLEAR) check is done and then
racing with generic_drop_inode() before the igrab() can take a
reference on the inode.

ATM, I think the real problem is the use of the XFS_IRECLAIMABLE
flag and the locking involved. Nathan, the inode hash lock
is used to sync that flag between xfs_iget_core() and
xfs_finish_reclaim(), but no lock is used when setting it or
(now) checking in xfs_iunpin().....

Worse, the i_flags field does not use atomic bitops and
there is no consistent locking protecting i_flags so updates
and reads of this filed can race or even get lost....

Also, I think that xfs_iunpin() must execute atomically w.r.t
xfs_reclaim(), otherwise we cannot ever safely do the checks
we are doing in xfs_iunpin().

> It does seem much simpler and less invasive than the earlier fix
> using a spinlock.  I'll run with this in my testing for awhile,
> let me know how your own testing goes too, please (I especially
> would like to hear if it fixes that reproducible failure case).

I think a fix is going to be much more invasive than just adding
reference as my fixes appear to have only narrowed the race window
and not solved it. The addition of the lock in the original patch
solves the atomic xfs_iunpin()/xfs_reclaim() execution problem,
but it does not solve the problems with the i_flags field. Adding
a new lock may be our only option here.

This will have to wait until after OLS before I get a chance to
look at this further...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed
  2006-07-17 14:07     ` David Chinner
@ 2006-07-24  8:01       ` Masayuki Saito
  2006-08-14  2:59         ` David Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Masayuki Saito @ 2006-07-24  8:01 UTC (permalink / raw)
  To: Nathan Scott, David Chinner; +Cc: xfs, linux-kernel

Hi Nathan, David,
Thank you for comments.

>I don't think it fixes the problem because igrab() fails to handle
>the case we are hitting where I_CLEAR is set on the inode when we
>mark it dirty. There's nothing in this patch preventing us from
>sleeping after the !(I_NEW|I_FREEING|I_CLEAR) check is done and then
>racing with generic_drop_inode() before the igrab() can take a
>reference on the inode.

I overlooked the case.  Thank you for your review.


>Worse, the i_flags field does not use atomic bitops and
>there is no consistent locking protecting i_flags so updates
>and reads of this filed can race or even get lost....

I agree it, too.  I think that we should add new spin_lock for
i_flags.


>I think a fix is going to be much more invasive than just adding
>reference as my fixes appear to have only narrowed the race window
>and not solved it. The addition of the lock in the original patch
>solves the atomic xfs_iunpin()/xfs_reclaim() execution problem,
>but it does not solve the problems with the i_flags field. Adding
>a new lock may be our only option here.

I'm considering the solution which fixes two problems([a] i_state of
the inode is changed while the inode is freed in xfs filesystem and
[b] the above i_flags problem)

the solution:
(1)Add new spin_lock(i_flags_lock) for all refernece and change
   places of all i_flags.
(2)Add igrab()/iput() for xfs_iunpin().

It makes sure that mark_inode_dirty_sync() is never called if
xfs_iunpin() runs after I_CLEAR is set.  Because XFS_IRECLAIM
or XFS_IRECLAIMABLE is set/checked within the spin_lock.

And there is the reason that igrab()/iput() is needed even if I add
new spin_lock for xfs_iunpin().  We can prevent the following case
by adding them.
* After passing (I_NEW|I_FREEING|I_CLEAR) check in xfs_iunpin(),
  I_FREEING is set.
* Then mark_inode_dirty_sync() is called and i_state is changed.
* Hit BUG_ON(!(inode->i_state & I_FREEING)) in clear_inode().

If these ideas seem to be correct, I'll make patches for above (1),(2).
Any comment?


(The following is a part of my thinking patch.  Only xfs_iunpin().)

--- linux-2.6.17.6/fs/xfs/xfs_inode.c.orig	2006-07-22 08:07:50.194236144 +0900
+++ linux-2.6.17.6/fs/xfs/xfs_inode.c	2006-07-25 06:07:18.062853045 +0900
@@ -2729,6 +2729,8 @@ void
 xfs_iunpin(
 	xfs_inode_t	*ip)
 {
+	int need_unlock;
+
 	ASSERT(atomic_read(&ip->i_pincount) > 0);
 
 	if (atomic_dec_and_test(&ip->i_pincount)) {
@@ -2744,6 +2746,8 @@ xfs_iunpin(
 		 * call as the inode reclaim may be blocked waiting for
 		 * the inode to become unpinned.
 		 */
+		spin_lock(&ip->i_flags_lock);
+		need_unlock = 1;
 		if (!(ip->i_flags & (XFS_IRECLAIM|XFS_IRECLAIMABLE))) {
 			vnode_t	*vp = XFS_ITOV_NULL(ip);
 
@@ -2751,10 +2755,22 @@ xfs_iunpin(
 			if (vp) {
 				struct inode	*inode = vn_to_inode(vp);
 
-				if (!(inode->i_state & I_NEW))
-					mark_inode_dirty_sync(inode);
+				if (!(inode->i_state &
+						(I_NEW|I_FREEING|I_CLEAR))) {
+					inode = igrab(inode);
+					if (inode != NULL) {
+						mark_inode_dirty_sync(inode);
+						spin_unlock(&ip->i_flags_lock);
+						need_unlock = 0;
+						iput(inode);
+					}
+				}
 			}
 		}
+		if (need_unlock) {
+			spin_unlock(&ip->i_flags_lock);
+			need_unlock = 0;
+		}
 		wake_up(&ip->i_ipin_wait);
 	}
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed
  2006-07-24  8:01       ` Masayuki Saito
@ 2006-08-14  2:59         ` David Chinner
  2006-08-22  7:48           ` Masayuki Saito
  0 siblings, 1 reply; 6+ messages in thread
From: David Chinner @ 2006-08-14  2:59 UTC (permalink / raw)
  To: Masayuki Saito; +Cc: Nathan Scott, David Chinner, xfs, linux-kernel

Masayuki,

Sorry for taking so long to get back to you - travelling and vacation
left a mountain of email for me to delete :/

On Mon, Jul 24, 2006 at 05:01:33PM +0900, Masayuki Saito wrote:
> >I think a fix is going to be much more invasive than just adding
> >reference as my fixes appear to have only narrowed the race window
> >and not solved it. The addition of the lock in the original patch
> >solves the atomic xfs_iunpin()/xfs_reclaim() execution problem,
> >but it does not solve the problems with the i_flags field. Adding
> >a new lock may be our only option here.
> 
> I'm considering the solution which fixes two problems([a] i_state of
> the inode is changed while the inode is freed in xfs filesystem and
> [b] the above i_flags problem)
> 
> the solution:
> (1)Add new spin_lock(i_flags_lock) for all refernece and change
>    places of all i_flags.
> (2)Add igrab()/iput() for xfs_iunpin().
> 
> It makes sure that mark_inode_dirty_sync() is never called if
> xfs_iunpin() runs after I_CLEAR is set.  Because XFS_IRECLAIM
> or XFS_IRECLAIMABLE is set/checked within the spin_lock.

*nod*

> And there is the reason that igrab()/iput() is needed even if I add
> new spin_lock for xfs_iunpin().  We can prevent the following case
> by adding them.
> * After passing (I_NEW|I_FREEING|I_CLEAR) check in xfs_iunpin(),
>   I_FREEING is set.
> * Then mark_inode_dirty_sync() is called and i_state is changed.
> * Hit BUG_ON(!(inode->i_state & I_FREEING)) in clear_inode().

*nod*

> If these ideas seem to be correct, I'll make patches for above (1),(2).
> Any comment?
> 
> 
> (The following is a part of my thinking patch.  Only xfs_iunpin().)
> 
> --- linux-2.6.17.6/fs/xfs/xfs_inode.c.orig	2006-07-22 08:07:50.194236144 +0900
> +++ linux-2.6.17.6/fs/xfs/xfs_inode.c	2006-07-25 06:07:18.062853045 +0900
> @@ -2729,6 +2729,8 @@ void
>  xfs_iunpin(
>  	xfs_inode_t	*ip)
>  {
> +	int need_unlock;
> +
>  	ASSERT(atomic_read(&ip->i_pincount) > 0);
>  
>  	if (atomic_dec_and_test(&ip->i_pincount)) {
> @@ -2744,6 +2746,8 @@ xfs_iunpin(
>  		 * call as the inode reclaim may be blocked waiting for
>  		 * the inode to become unpinned.
>  		 */
> +		spin_lock(&ip->i_flags_lock);
> +		need_unlock = 1;
>  		if (!(ip->i_flags & (XFS_IRECLAIM|XFS_IRECLAIMABLE))) {
>  			vnode_t	*vp = XFS_ITOV_NULL(ip);
>  
> @@ -2751,10 +2755,22 @@ xfs_iunpin(
>  			if (vp) {
>  				struct inode	*inode = vn_to_inode(vp);
>  
> -				if (!(inode->i_state & I_NEW))
> -					mark_inode_dirty_sync(inode);
> +				if (!(inode->i_state &
> +						(I_NEW|I_FREEING|I_CLEAR))) {
> +					inode = igrab(inode);
> +					if (inode != NULL) {
> +						mark_inode_dirty_sync(inode);
> +						spin_unlock(&ip->i_flags_lock);
> +						need_unlock = 0;
> +						iput(inode);
> +					}
> +				}
>  			}
>  		}
> +		if (need_unlock) {
> +			spin_unlock(&ip->i_flags_lock);
> +			need_unlock = 0;
> +		}
>  		wake_up(&ip->i_ipin_wait);
>  	}
>  }

Hmmm - Idon't think we should iput() before we wake up any pinned waiters.
When we have a waiter on i_ipin_wait (called from xfs_iflush()), we have
a thread sleeping with the inode locked.

If we then call iput() and it drops the last reference, we can call back
into the filesystem and start transactions. Those transactions will need
to lock the inode. Hence I think the above can deadlock when racing against
an inode flush. 

The code should probably read:

	if (dropped last pincount) {
		int need_iput = 0;
		struct inode *inode;

		spin_lock(i_flags_lock)
		if (!reclaimable) {
			if (!vp) {
				if (!(i_state & (NEW|CLEAR))) {
					inode = igrab(inode)
					if (inode) {
						need_iput = 1	
						mark_inode_dirty_sync(inode)
					}
				}
			}
		}
		spin_unlock(i_flags_lock)
		wake_up(&ip->i_ipin_wait)
		if (need_iput)
			iput(inode);
	}

to avoid this possible deadlock.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed
  2006-08-14  2:59         ` David Chinner
@ 2006-08-22  7:48           ` Masayuki Saito
  0 siblings, 0 replies; 6+ messages in thread
From: Masayuki Saito @ 2006-08-22  7:48 UTC (permalink / raw)
  To: David Chinner; +Cc: Nathan Scott, xfs, linux-kernel

Hi Nathan, David,

I had the vacation too.

David Chinner <dgc@sgi.com> wrote:
>Hmmm - Idon't think we should iput() before we wake up any pinned waiters.
>When we have a waiter on i_ipin_wait (called from xfs_iflush()), we have
>a thread sleeping with the inode locked.
>
>If we then call iput() and it drops the last reference, we can call back
>into the filesystem and start transactions. Those transactions will need
>to lock the inode. Hence I think the above can deadlock when racing against
>an inode flush. 
>
>The code should probably read:
>
>	if (dropped last pincount) {
>		int need_iput = 0;
>		struct inode *inode;
>
>		spin_lock(i_flags_lock)
>		if (!reclaimable) {
>			if (!vp) {
>				if (!(i_state & (NEW|CLEAR))) {
>					inode = igrab(inode)
>					if (inode) {
>						need_iput = 1	
>						mark_inode_dirty_sync(inode)
>					}
>				}
>			}
>		}
>		spin_unlock(i_flags_lock)
>		wake_up(&ip->i_ipin_wait)
>		if (need_iput)
>			iput(inode);
>	}
>
>to avoid this possible deadlock.

OK, I see your point.  There is wait_event(in xfs_iunpin_wait) that should
be wake_up'ed(in xfs_iunpin), so deadlock can occur.

I'll update my patch and test it.  Please wait for a few moments.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-08-22  7:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20060710103740.B1674239@wobbly.melbourne.sgi.com>
2006-07-14 10:25 ` [PATCH] xfs: i_state of inode is changed after the inode is freed Masayuki Saito
2006-07-17 11:05   ` Nathan Scott
2006-07-17 14:07     ` David Chinner
2006-07-24  8:01       ` Masayuki Saito
2006-08-14  2:59         ` David Chinner
2006-08-22  7:48           ` Masayuki Saito

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox