* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed [not found] <20060710103740.B1674239@wobbly.melbourne.sgi.com> @ 2006-07-14 10:25 ` Masayuki Saito 2006-07-17 11:05 ` Nathan Scott 0 siblings, 1 reply; 6+ messages in thread From: Masayuki Saito @ 2006-07-14 10:25 UTC (permalink / raw) To: Nathan Scott; +Cc: David Chinner, xfs, linux-kernel Hi, Nathan. >I'll leave it to Dave to comment more later (he's travelling at the >moment), since he's had his head deep in this area of the code most >recently - but my first thoughts on your patch are that its solving >the problem incorrectly. We should not be in the destroy_inode code >if the inode reference counting is correct everywhere - I would have >expected the fix to be a get/put style change, rather than adding an >inode lock and new lock/unlock semantics around an individual field; >... and if that cannot be done to fix this (eh?), then some comments >as to why refcounting didn't solve the problem here. On the basis of the above, I consider the get/put style fix which use i_count. This problem is that i_state of the inode is changed while the inode is freed in xfs filesystem. And the cause is that the inode release and xfs_iunpun() can run in parallel. To fix this problem, I added a pair of igrab()/iput() before and behind mark_inode_dirty_sync() at xfs_iunpin(). I think this can change it as follows. (1)The case that the inode release transaction runs after xfs_iunpin() is called. While mark_inode_dirty_sync() is running, igrab() promises that the inode is alive. (2)The case that xfs_iunpin() is called after iput() in the inode release transaction is called(i_count is 0). mark_inode_dirty_sync() is not called because the igrab() can not get the inode. I have made the following patch, but it is not yet tested. I would like to hear your comment, first. Signed-off-by: Masayuki Saito <m-saito@tnes.nec.co.jp> --- --- linux-2.6.17.4/fs/xfs/xfs_inode.c.orig 2006-07-14 09:44:44.187844139 +0900 +++ linux-2.6.17.4/fs/xfs/xfs_inode.c 2006-07-14 09:58:26.398486290 +0900 @@ -2751,8 +2751,14 @@ xfs_iunpin( if (vp) { struct inode *inode = vn_to_inode(vp); - if (!(inode->i_state & I_NEW)) - mark_inode_dirty_sync(inode); + if (!(inode->i_state & + (I_NEW|I_FREEING|I_CLEAR))) { + inode = igrab(inode); + if (inode != NULL) { + mark_inode_dirty_sync(inode); + iput(inode); + } + } } } wake_up(&ip->i_ipin_wait); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed 2006-07-14 10:25 ` [PATCH] xfs: i_state of inode is changed after the inode is freed Masayuki Saito @ 2006-07-17 11:05 ` Nathan Scott 2006-07-17 14:07 ` David Chinner 0 siblings, 1 reply; 6+ messages in thread From: Nathan Scott @ 2006-07-17 11:05 UTC (permalink / raw) To: Masayuki Saito; +Cc: David Chinner, xfs, linux-kernel On Fri, Jul 14, 2006 at 07:25:20PM +0900, Masayuki Saito wrote: > Hi, Nathan. > > >I'll leave it to Dave to comment more later (he's travelling at the > >moment), since he's had his head deep in this area of the code most > >recently - but my first thoughts on your patch are that its solving > >the problem incorrectly. We should not be in the destroy_inode code > >if the inode reference counting is correct everywhere - I would have > >expected the fix to be a get/put style change, rather than adding an > >inode lock and new lock/unlock semantics around an individual field; > >... and if that cannot be done to fix this (eh?), then some comments > >as to why refcounting didn't solve the problem here. > > On the basis of the above, I consider the get/put style fix which use > i_count. > > This problem is that i_state of the inode is changed while the inode > is freed in xfs filesystem. And the cause is that the inode release > and xfs_iunpun() can run in parallel. > > To fix this problem, I added a pair of igrab()/iput() before and behind > mark_inode_dirty_sync() at xfs_iunpin(). I think this can change it as > follows. > > (1)The case that the inode release transaction runs after xfs_iunpin() > is called. > While mark_inode_dirty_sync() is running, igrab() promises that the > inode is alive. > > (2)The case that xfs_iunpin() is called after iput() in the inode > release transaction is called(i_count is 0). > mark_inode_dirty_sync() is not called because the igrab() can not get > the inode. > > I have made the following patch, but it is not yet tested. > I would like to hear your comment, first. If this fixes your test case, then I like the look of it. ;-) It does seem much simpler and less invasive than the earlier fix using a spinlock. I'll run with this in my testing for awhile, let me know how your own testing goes too, please (I especially would like to hear if it fixes that reproducible failure case). thanks! -- Nathan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed 2006-07-17 11:05 ` Nathan Scott @ 2006-07-17 14:07 ` David Chinner 2006-07-24 8:01 ` Masayuki Saito 0 siblings, 1 reply; 6+ messages in thread From: David Chinner @ 2006-07-17 14:07 UTC (permalink / raw) To: Nathan Scott; +Cc: Masayuki Saito, David Chinner, xfs, linux-kernel On Mon, Jul 17, 2006 at 09:05:01PM +1000, Nathan Scott wrote: > On Fri, Jul 14, 2006 at 07:25:20PM +0900, Masayuki Saito wrote: > > Hi, Nathan. > > > > >I'll leave it to Dave to comment more later (he's travelling at the > > >moment), since he's had his head deep in this area of the code most > > >recently - but my first thoughts on your patch are that its solving > > >the problem incorrectly. We should not be in the destroy_inode code > > >if the inode reference counting is correct everywhere - I would have > > >expected the fix to be a get/put style change, rather than adding an > > >inode lock and new lock/unlock semantics around an individual field; > > >... and if that cannot be done to fix this (eh?), then some comments > > >as to why refcounting didn't solve the problem here. Looking at this a bit deeper, I think the reference counting is correct and the problem is that we are not breaking the link between the linux inode and the xfs inode atomically w.r.t xfs_iunpin() usage.... > > On the basis of the above, I consider the get/put style fix which use > > i_count. > > > > This problem is that i_state of the inode is changed while the inode > > is freed in xfs filesystem. And the cause is that the inode release > > and xfs_iunpun() can run in parallel. > > > > To fix this problem, I added a pair of igrab()/iput() before and behind > > mark_inode_dirty_sync() at xfs_iunpin(). I think this can change it as > > follows. > > > > (1)The case that the inode release transaction runs after xfs_iunpin() > > is called. > > While mark_inode_dirty_sync() is running, igrab() promises that the > > inode is alive. > > > > (2)The case that xfs_iunpin() is called after iput() in the inode > > release transaction is called(i_count is 0). > > mark_inode_dirty_sync() is not called because the igrab() can not get > > the inode. > > > > I have made the following patch, but it is not yet tested. > > I would like to hear your comment, first. > > If this fixes your test case, then I like the look of it. ;-) I don't think it fixes the problem because igrab() fails to handle the case we are hitting where I_CLEAR is set on the inode when we mark it dirty. There's nothing in this patch preventing us from sleeping after the !(I_NEW|I_FREEING|I_CLEAR) check is done and then racing with generic_drop_inode() before the igrab() can take a reference on the inode. ATM, I think the real problem is the use of the XFS_IRECLAIMABLE flag and the locking involved. Nathan, the inode hash lock is used to sync that flag between xfs_iget_core() and xfs_finish_reclaim(), but no lock is used when setting it or (now) checking in xfs_iunpin()..... Worse, the i_flags field does not use atomic bitops and there is no consistent locking protecting i_flags so updates and reads of this filed can race or even get lost.... Also, I think that xfs_iunpin() must execute atomically w.r.t xfs_reclaim(), otherwise we cannot ever safely do the checks we are doing in xfs_iunpin(). > It does seem much simpler and less invasive than the earlier fix > using a spinlock. I'll run with this in my testing for awhile, > let me know how your own testing goes too, please (I especially > would like to hear if it fixes that reproducible failure case). I think a fix is going to be much more invasive than just adding reference as my fixes appear to have only narrowed the race window and not solved it. The addition of the lock in the original patch solves the atomic xfs_iunpin()/xfs_reclaim() execution problem, but it does not solve the problems with the i_flags field. Adding a new lock may be our only option here. This will have to wait until after OLS before I get a chance to look at this further... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed 2006-07-17 14:07 ` David Chinner @ 2006-07-24 8:01 ` Masayuki Saito 2006-08-14 2:59 ` David Chinner 0 siblings, 1 reply; 6+ messages in thread From: Masayuki Saito @ 2006-07-24 8:01 UTC (permalink / raw) To: Nathan Scott, David Chinner; +Cc: xfs, linux-kernel Hi Nathan, David, Thank you for comments. >I don't think it fixes the problem because igrab() fails to handle >the case we are hitting where I_CLEAR is set on the inode when we >mark it dirty. There's nothing in this patch preventing us from >sleeping after the !(I_NEW|I_FREEING|I_CLEAR) check is done and then >racing with generic_drop_inode() before the igrab() can take a >reference on the inode. I overlooked the case. Thank you for your review. >Worse, the i_flags field does not use atomic bitops and >there is no consistent locking protecting i_flags so updates >and reads of this filed can race or even get lost.... I agree it, too. I think that we should add new spin_lock for i_flags. >I think a fix is going to be much more invasive than just adding >reference as my fixes appear to have only narrowed the race window >and not solved it. The addition of the lock in the original patch >solves the atomic xfs_iunpin()/xfs_reclaim() execution problem, >but it does not solve the problems with the i_flags field. Adding >a new lock may be our only option here. I'm considering the solution which fixes two problems([a] i_state of the inode is changed while the inode is freed in xfs filesystem and [b] the above i_flags problem) the solution: (1)Add new spin_lock(i_flags_lock) for all refernece and change places of all i_flags. (2)Add igrab()/iput() for xfs_iunpin(). It makes sure that mark_inode_dirty_sync() is never called if xfs_iunpin() runs after I_CLEAR is set. Because XFS_IRECLAIM or XFS_IRECLAIMABLE is set/checked within the spin_lock. And there is the reason that igrab()/iput() is needed even if I add new spin_lock for xfs_iunpin(). We can prevent the following case by adding them. * After passing (I_NEW|I_FREEING|I_CLEAR) check in xfs_iunpin(), I_FREEING is set. * Then mark_inode_dirty_sync() is called and i_state is changed. * Hit BUG_ON(!(inode->i_state & I_FREEING)) in clear_inode(). If these ideas seem to be correct, I'll make patches for above (1),(2). Any comment? (The following is a part of my thinking patch. Only xfs_iunpin().) --- linux-2.6.17.6/fs/xfs/xfs_inode.c.orig 2006-07-22 08:07:50.194236144 +0900 +++ linux-2.6.17.6/fs/xfs/xfs_inode.c 2006-07-25 06:07:18.062853045 +0900 @@ -2729,6 +2729,8 @@ void xfs_iunpin( xfs_inode_t *ip) { + int need_unlock; + ASSERT(atomic_read(&ip->i_pincount) > 0); if (atomic_dec_and_test(&ip->i_pincount)) { @@ -2744,6 +2746,8 @@ xfs_iunpin( * call as the inode reclaim may be blocked waiting for * the inode to become unpinned. */ + spin_lock(&ip->i_flags_lock); + need_unlock = 1; if (!(ip->i_flags & (XFS_IRECLAIM|XFS_IRECLAIMABLE))) { vnode_t *vp = XFS_ITOV_NULL(ip); @@ -2751,10 +2755,22 @@ xfs_iunpin( if (vp) { struct inode *inode = vn_to_inode(vp); - if (!(inode->i_state & I_NEW)) - mark_inode_dirty_sync(inode); + if (!(inode->i_state & + (I_NEW|I_FREEING|I_CLEAR))) { + inode = igrab(inode); + if (inode != NULL) { + mark_inode_dirty_sync(inode); + spin_unlock(&ip->i_flags_lock); + need_unlock = 0; + iput(inode); + } + } } } + if (need_unlock) { + spin_unlock(&ip->i_flags_lock); + need_unlock = 0; + } wake_up(&ip->i_ipin_wait); } } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed 2006-07-24 8:01 ` Masayuki Saito @ 2006-08-14 2:59 ` David Chinner 2006-08-22 7:48 ` Masayuki Saito 0 siblings, 1 reply; 6+ messages in thread From: David Chinner @ 2006-08-14 2:59 UTC (permalink / raw) To: Masayuki Saito; +Cc: Nathan Scott, David Chinner, xfs, linux-kernel Masayuki, Sorry for taking so long to get back to you - travelling and vacation left a mountain of email for me to delete :/ On Mon, Jul 24, 2006 at 05:01:33PM +0900, Masayuki Saito wrote: > >I think a fix is going to be much more invasive than just adding > >reference as my fixes appear to have only narrowed the race window > >and not solved it. The addition of the lock in the original patch > >solves the atomic xfs_iunpin()/xfs_reclaim() execution problem, > >but it does not solve the problems with the i_flags field. Adding > >a new lock may be our only option here. > > I'm considering the solution which fixes two problems([a] i_state of > the inode is changed while the inode is freed in xfs filesystem and > [b] the above i_flags problem) > > the solution: > (1)Add new spin_lock(i_flags_lock) for all refernece and change > places of all i_flags. > (2)Add igrab()/iput() for xfs_iunpin(). > > It makes sure that mark_inode_dirty_sync() is never called if > xfs_iunpin() runs after I_CLEAR is set. Because XFS_IRECLAIM > or XFS_IRECLAIMABLE is set/checked within the spin_lock. *nod* > And there is the reason that igrab()/iput() is needed even if I add > new spin_lock for xfs_iunpin(). We can prevent the following case > by adding them. > * After passing (I_NEW|I_FREEING|I_CLEAR) check in xfs_iunpin(), > I_FREEING is set. > * Then mark_inode_dirty_sync() is called and i_state is changed. > * Hit BUG_ON(!(inode->i_state & I_FREEING)) in clear_inode(). *nod* > If these ideas seem to be correct, I'll make patches for above (1),(2). > Any comment? > > > (The following is a part of my thinking patch. Only xfs_iunpin().) > > --- linux-2.6.17.6/fs/xfs/xfs_inode.c.orig 2006-07-22 08:07:50.194236144 +0900 > +++ linux-2.6.17.6/fs/xfs/xfs_inode.c 2006-07-25 06:07:18.062853045 +0900 > @@ -2729,6 +2729,8 @@ void > xfs_iunpin( > xfs_inode_t *ip) > { > + int need_unlock; > + > ASSERT(atomic_read(&ip->i_pincount) > 0); > > if (atomic_dec_and_test(&ip->i_pincount)) { > @@ -2744,6 +2746,8 @@ xfs_iunpin( > * call as the inode reclaim may be blocked waiting for > * the inode to become unpinned. > */ > + spin_lock(&ip->i_flags_lock); > + need_unlock = 1; > if (!(ip->i_flags & (XFS_IRECLAIM|XFS_IRECLAIMABLE))) { > vnode_t *vp = XFS_ITOV_NULL(ip); > > @@ -2751,10 +2755,22 @@ xfs_iunpin( > if (vp) { > struct inode *inode = vn_to_inode(vp); > > - if (!(inode->i_state & I_NEW)) > - mark_inode_dirty_sync(inode); > + if (!(inode->i_state & > + (I_NEW|I_FREEING|I_CLEAR))) { > + inode = igrab(inode); > + if (inode != NULL) { > + mark_inode_dirty_sync(inode); > + spin_unlock(&ip->i_flags_lock); > + need_unlock = 0; > + iput(inode); > + } > + } > } > } > + if (need_unlock) { > + spin_unlock(&ip->i_flags_lock); > + need_unlock = 0; > + } > wake_up(&ip->i_ipin_wait); > } > } Hmmm - Idon't think we should iput() before we wake up any pinned waiters. When we have a waiter on i_ipin_wait (called from xfs_iflush()), we have a thread sleeping with the inode locked. If we then call iput() and it drops the last reference, we can call back into the filesystem and start transactions. Those transactions will need to lock the inode. Hence I think the above can deadlock when racing against an inode flush. The code should probably read: if (dropped last pincount) { int need_iput = 0; struct inode *inode; spin_lock(i_flags_lock) if (!reclaimable) { if (!vp) { if (!(i_state & (NEW|CLEAR))) { inode = igrab(inode) if (inode) { need_iput = 1 mark_inode_dirty_sync(inode) } } } } spin_unlock(i_flags_lock) wake_up(&ip->i_ipin_wait) if (need_iput) iput(inode); } to avoid this possible deadlock. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: i_state of inode is changed after the inode is freed 2006-08-14 2:59 ` David Chinner @ 2006-08-22 7:48 ` Masayuki Saito 0 siblings, 0 replies; 6+ messages in thread From: Masayuki Saito @ 2006-08-22 7:48 UTC (permalink / raw) To: David Chinner; +Cc: Nathan Scott, xfs, linux-kernel Hi Nathan, David, I had the vacation too. David Chinner <dgc@sgi.com> wrote: >Hmmm - Idon't think we should iput() before we wake up any pinned waiters. >When we have a waiter on i_ipin_wait (called from xfs_iflush()), we have >a thread sleeping with the inode locked. > >If we then call iput() and it drops the last reference, we can call back >into the filesystem and start transactions. Those transactions will need >to lock the inode. Hence I think the above can deadlock when racing against >an inode flush. > >The code should probably read: > > if (dropped last pincount) { > int need_iput = 0; > struct inode *inode; > > spin_lock(i_flags_lock) > if (!reclaimable) { > if (!vp) { > if (!(i_state & (NEW|CLEAR))) { > inode = igrab(inode) > if (inode) { > need_iput = 1 > mark_inode_dirty_sync(inode) > } > } > } > } > spin_unlock(i_flags_lock) > wake_up(&ip->i_ipin_wait) > if (need_iput) > iput(inode); > } > >to avoid this possible deadlock. OK, I see your point. There is wait_event(in xfs_iunpin_wait) that should be wake_up'ed(in xfs_iunpin), so deadlock can occur. I'll update my patch and test it. Please wait for a few moments. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-08-22 7:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060710103740.B1674239@wobbly.melbourne.sgi.com>
2006-07-14 10:25 ` [PATCH] xfs: i_state of inode is changed after the inode is freed Masayuki Saito
2006-07-17 11:05 ` Nathan Scott
2006-07-17 14:07 ` David Chinner
2006-07-24 8:01 ` Masayuki Saito
2006-08-14 2:59 ` David Chinner
2006-08-22 7:48 ` Masayuki Saito
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox