From: Christoph Hellwig <hch@infradead.org>
To: Felix Blyakher <felixb@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit
Date: Sun, 16 Aug 2009 20:36:34 -0400 [thread overview]
Message-ID: <20090817003634.GA31274@infradead.org> (raw)
In-Reply-To: <BF8D127F-8F72-4B8C-B14C-24DD1C7EC235@sgi.com>
On Sun, Aug 16, 2009 at 05:54:35PM -0500, Felix Blyakher wrote:
>> The wait_on_inode is only sensible for the non-recycle case.
>
> The case, I was referring to, was indeed the reclaimable one when
> the first thread is going through
>
> xfs_iget
> xfs_iget_cache_hit
> if (ip->i_flags & XFS_IRECLAIMABLE) {
> ip->i_flags |= XFS_INEW;
> -->
> xfs_setup_inode
> inode->i_state = I_NEW|I_LOCK;
>
>
> while another therad run through the following sequence right where the
> arrow shows above:
>
> xfs_iget_cache_hit
> if (ip->i_flags & XFS_INEW) {
> wait_on_inode
>
> There is nothing to wait on here yet, as I_LOCK is not set yet.
Yeah. The new version should fix it.
Here's a version with the small update that Eric suggested, any chance
we could get this into 2.6.31 still?
--
Subject: xfs: fix locking in xfs_iget_cache_hit
From: Christoph Hellwig <hch@lst.de>
The locking in xfs_iget_cache_hit currently has numerous problems:
- we clear the reclaim tag without i_flags_lock which protects modifications
to it
- we call inode_init_always which can sleep with pag_ici_lock held
(this is oss.sgi.com BZ #819)
- we acquire and drop i_flags_lock a lot and thus provide no consistency
between the various flags we set/clear under it
This patch fixes all that with a major revamp of the locking in the function.
The new version acquires i_flags_lock early and only drops it once we need to
call into inode_init_always or before calling xfs_ilock.
This patch fixes a bug seen in the wild where we race modifying the reclaim tag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Index: linux-2.6/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_iget.c 2009-08-16 20:10:25.200960533 -0300
+++ linux-2.6/fs/xfs/xfs_iget.c 2009-08-16 20:11:30.580432781 -0300
@@ -191,80 +191,82 @@ xfs_iget_cache_hit(
int flags,
int lock_flags) __releases(pag->pag_ici_lock)
{
+ struct inode *inode = VFS_I(ip);
struct xfs_mount *mp = ip->i_mount;
- int error = EAGAIN;
+ int error;
+
+ spin_lock(&ip->i_flags_lock);
/*
- * If INEW is set this inode is being set up
- * If IRECLAIM is set this inode is being torn down
- * Pause and try again.
+ * If we are racing with another cache hit that is currently
+ * instantiating this inode or currently recycling it out of
+ * reclaimabe state, wait for the initialisation to complete
+ * before continuing.
+ *
+ * XXX(hch): eventually we should do something equivalent to
+ * wait_on_inode to wait for these flags to be cleared
+ * instead of polling for it.
*/
- if (xfs_iflags_test(ip, (XFS_INEW|XFS_IRECLAIM))) {
+ if (ip->i_flags & (XFS_INEW|XFS_IRECLAIM)) {
XFS_STATS_INC(xs_ig_frecycle);
+ error = EAGAIN;
goto out_error;
}
- /* If IRECLAIMABLE is set, we've torn down the vfs inode part */
- if (xfs_iflags_test(ip, XFS_IRECLAIMABLE)) {
-
- /*
- * If lookup is racing with unlink, then we should return an
- * error immediately so we don't remove it from the reclaim
- * list and potentially leak the inode.
- */
- if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
- error = ENOENT;
- goto out_error;
- }
+ /*
+ * If lookup is racing with unlink return an error immediately.
+ */
+ if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
+ error = ENOENT;
+ goto out_error;
+ }
+ /*
+ * If IRECLAIMABLE is set, we've torn down the VFS inode already.
+ * Need to carefully get it back into useable state.
+ */
+ if (ip->i_flags & XFS_IRECLAIMABLE) {
xfs_itrace_exit_tag(ip, "xfs_iget.alloc");
/*
- * We need to re-initialise the VFS inode as it has been
- * 'freed' by the VFS. Do this here so we can deal with
- * errors cleanly, then tag it so it can be set up correctly
- * later.
+ * We need to set XFS_INEW atomically with clearing the
+ * reclaimable tag so that we do have an indicator of the
+ * inode still being initialized.
*/
- if (inode_init_always(mp->m_super, VFS_I(ip))) {
- error = ENOMEM;
- goto out_error;
- }
+ ip->i_flags |= XFS_INEW;
+ ip->i_flags &= ~XFS_IRECLAIMABLE;
+ __xfs_inode_clear_reclaim_tag(mp, pag, ip);
- /*
- * We must set the XFS_INEW flag before clearing the
- * XFS_IRECLAIMABLE flag so that if a racing lookup does
- * not find the XFS_IRECLAIMABLE above but has the igrab()
- * below succeed we can safely check XFS_INEW to detect
- * that this inode is still being initialised.
- */
- xfs_iflags_set(ip, XFS_INEW);
- xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
+ spin_unlock(&ip->i_flags_lock);
+ read_unlock(&pag->pag_ici_lock);
- /* clear the radix tree reclaim flag as well. */
- __xfs_inode_clear_reclaim_tag(mp, pag, ip);
- } else if (!igrab(VFS_I(ip))) {
+ error = -inode_init_always(mp->m_super, inode);
+ if (error) {
+ /*
+ * Re-initializing the inode failed, and we are in deep
+ * trouble. Try to re-add it to the reclaim list.
+ */
+ read_lock(&pag->pag_ici_lock);
+ spin_lock(&ip->i_flags_lock);
+
+ ip->i_flags &= ~XFS_INEW;
+ ip->i_flags |= XFS_IRECLAIMABLE;
+ __xfs_inode_set_reclaim_tag(pag, ip);
+ goto out_error;
+ }
+ inode->i_state = I_LOCK|I_NEW;
+ } else {
/* If the VFS inode is being torn down, pause and try again. */
- XFS_STATS_INC(xs_ig_frecycle);
- goto out_error;
- } else if (xfs_iflags_test(ip, XFS_INEW)) {
- /*
- * We are racing with another cache hit that is
- * currently recycling this inode out of the XFS_IRECLAIMABLE
- * state. Wait for the initialisation to complete before
- * continuing.
- */
- wait_on_inode(VFS_I(ip));
- }
+ if (!igrab(inode)) {
+ error = EAGAIN;
+ goto out_error;
+ }
- if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) {
- error = ENOENT;
- iput(VFS_I(ip));
- goto out_error;
+ /* We've got a live one. */
+ spin_unlock(&ip->i_flags_lock);
+ read_unlock(&pag->pag_ici_lock);
}
- /* We've got a live one. */
- read_unlock(&pag->pag_ici_lock);
-
if (lock_flags != 0)
xfs_ilock(ip, lock_flags);
@@ -274,6 +276,7 @@ xfs_iget_cache_hit(
return 0;
out_error:
+ spin_unlock(&ip->i_flags_lock);
read_unlock(&pag->pag_ici_lock);
return error;
}
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c 2009-08-16 20:01:14.632430664 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c 2009-08-16 20:10:25.740968342 -0300
@@ -708,6 +708,16 @@ xfs_reclaim_inode(
return 0;
}
+void
+__xfs_inode_set_reclaim_tag(
+ struct xfs_perag *pag,
+ struct xfs_inode *ip)
+{
+ radix_tree_tag_set(&pag->pag_ici_root,
+ XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+ XFS_ICI_RECLAIM_TAG);
+}
+
/*
* We set the inode flag atomically with the radix tree tag.
* Once we get tag lookups on the radix tree, this inode flag
@@ -722,8 +732,7 @@ xfs_inode_set_reclaim_tag(
read_lock(&pag->pag_ici_lock);
spin_lock(&ip->i_flags_lock);
- radix_tree_tag_set(&pag->pag_ici_root,
- XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+ __xfs_inode_set_reclaim_tag(pag, ip);
__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
spin_unlock(&ip->i_flags_lock);
read_unlock(&pag->pag_ici_lock);
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.h 2009-08-16 20:01:14.640431122 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.h 2009-08-16 20:10:25.744967593 -0300
@@ -48,6 +48,7 @@ int xfs_reclaim_inode(struct xfs_inode *
int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
void xfs_inode_set_reclaim_tag(struct xfs_inode *ip);
+void __xfs_inode_set_reclaim_tag(struct xfs_perag *pag, struct xfs_inode *ip);
void xfs_inode_clear_reclaim_tag(struct xfs_inode *ip);
void __xfs_inode_clear_reclaim_tag(struct xfs_mount *mp, struct xfs_perag *pag,
struct xfs_inode *ip);
next prev parent reply other threads:[~2009-08-17 0:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-04 14:15 [PATCH 0/4] XFS iget fixes Christoph Hellwig
2009-08-04 14:15 ` [PATCH 1/4] xfs: fix locking in xfs_iget_cache_hit Christoph Hellwig
2009-08-07 17:25 ` Felix Blyakher
2009-08-10 17:09 ` Christoph Hellwig
2009-08-16 21:01 ` Eric Sandeen
2009-08-16 22:54 ` Felix Blyakher
2009-08-17 0:36 ` Christoph Hellwig [this message]
2009-08-17 3:05 ` Felix Blyakher
2009-08-04 14:15 ` [PATCH 2/4] fix inode_init_always calling convention Christoph Hellwig
2009-08-06 22:30 ` Eric Sandeen
2009-08-07 17:39 ` Felix Blyakher
2009-08-07 18:09 ` Felix Blyakher
2009-08-04 14:15 ` [PATCH 3/4] add __destroy_inode Christoph Hellwig
2009-08-06 22:56 ` Eric Sandeen
2009-08-07 18:20 ` Felix Blyakher
2009-08-04 14:15 ` [PATCH 4/4] xfs: add xfs_inode_free Christoph Hellwig
2009-08-06 23:54 ` Eric Sandeen
2009-08-07 18:22 ` Felix Blyakher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090817003634.GA31274@infradead.org \
--to=hch@infradead.org \
--cc=felixb@sgi.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).