Re: [patch] Fix xfs_iunpin() sets I_DIRTY_SYNC after clear_inode().

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: David Chinner <dgc@sgi.com>
To: xfs@oss.sgi.com
Cc: Takenori Nagano <t-nagano@ah.jp.nec.com>
Subject: Re: [patch] Fix xfs_iunpin() sets I_DIRTY_SYNC after clear_inode().
Date: Wed, 18 Oct 2006 12:33:25 +1000	[thread overview]
Message-ID: <20061018023325.GL8394166@melbourne.sgi.com> (raw)
In-Reply-To: <20061017020218.GE8394166@melbourne.sgi.com>

On Tue, Oct 17, 2006 at 12:02:18PM +1000, David Chinner wrote:
> In doing the previous patch that removed the igrab/iput, I used the
> log force to provide synchronisation that prevented the unpin from
> ever seeing an invalid state and hence we couldn't ever get a
> use-after-free situation. What I failed to see was that we already
> have this mechanism - the i_flags_lock and the XFS_IRECLAIM* flags.
> 
> If we synchronise the setting of either of the XFS_IRECLAIM* flags
> with the breakage of the bhv_vnode<->xfs_inode link, then we can
> never get the state in xfs_iunpin() where the link has been broken
> and the XFS_IRECLAIM* flags are not set. The current usage of
> the i_flags_lock in xfs_iunpin is sufficient to provide this
> guarantee, but we are breaking the link before setting the
> XFS_IRECLAIMABLE flag in xfs_reclaim()....
> 
> So, here's another patch that doesn't have the performance problems,
> but removes the iput/igrab while still (I think) fixing the use
> after free problem. Can you try this one out, Takenori? I've
> run it through some stress tests and haven't been able to trigger
> problems.

I just hit the BUG_ON(vp == NULL) that I put in xfs_iunpin()
in this patch. The xfs inode had no link to the bhv_vnode, nor
did it have either XFS_IRECLAIM* flag set, and hence it triggered
the BUG.

The problem appears to be  a race with on lookup with an inode that
is getting deleted - xfs_iget_core() finds the xfs_inode in the
cache with the XFS_IRECLAIMABLE flag set, so it removes that flag.
It then removes the inode from the reclaim list. Then it checks to
see if the inode has been unlinked, and if the create flag is not
set we return ENOENT.

Hence if we have transactions still to be written to disk on this
inode, when xfs_iunpin finally gets called there is no reclaim flag
set so it assumes that there's a vnode assoicated with the xfs inode
and we got kaboom.

I think this is a pre-existing bug in xfs_iget_core() that can
result in a memory leak because xfs_iget_core() removes it from
the reclaim list and then forgets about at...

The following patch sits on top of the others - it may not apply
because the tree I just pulled it from has the radix tree
inode cache patches applied earlier in the series.

Comments?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/xfs_iget.c |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/xfs_iget.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_iget.c	2006-10-18 11:27:04.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_iget.c	2006-10-18 12:20:20.748107559 +1000
@@ -164,6 +164,34 @@ again:

 				goto again;
 			}
+
+			/*
+			 * If IRECLAIMABLE is set on this inode and lookup is
+			 * racing with unlink, then we should return an error
+			 * immediately so we don't remove it from the reclaim
+			 * list and potentially leak the inode.
+			 *
+			 * Also, there may be transactions sitting in the
+			 * incore log buffers or being flushed to disk at this
+			 * time.  We can't clear the XFS_IRECLAIMABLE flag
+			 * until these transactions have hit the disk,
+			 * otherwise we will void the guarantee the flag
+			 * provides xfs_iunpin()
+			 */
+			if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
+				if (ip->i_d.di_mode == 0) &&
+				    !(flags & XFS_IGET_CREATE)) {
+					read_unlock(&ih->ih_lock);
+					return ENOENT;
+				}
+				if (xfs_ipincount(ip)) {
+					read_unlock(&ih->ih_lock);
+					xfs_log_force(mp, 0,
+						XFS_LOG_FORCE|XFS_LOG_SYNC);
+					XFS_STATS_INC(xs_ig_frecycle);
+					goto again;
+				}
+			}
 			vn_trace_exit(vp, "xfs_iget.alloc",
 				(inst_t *)__return_address);

next prev parent reply	other threads:[~2006-10-18  2:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-04  9:20 [patch] Fix xfs_iunpin() sets I_DIRTY_SYNC after clear_inode() Takenori Nagano
2006-10-06  3:26 ` David Chinner
2006-10-11  6:43   ` David Chinner
2006-10-12 12:20     ` Takenori Nagano
2006-10-13  1:46       ` David Chinner
2006-10-13  8:06         ` Timothy Shimmin
2006-10-13 12:17         ` Takenori Nagano
2006-10-17  2:02           ` David Chinner
2006-10-18  2:33             ` David Chinner [this message]
2006-10-18  9:07               ` David Chinner
2006-10-19  2:23                 ` Takenori Nagano
2006-10-19  4:58                   ` David Chinner
2006-10-20  4:25                     ` Takenori Nagano
2006-10-23  6:53                       ` Takenori Nagano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061018023325.GL8394166@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=t-nagano@ah.jp.nec.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox