From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
xfs@oss.sgi.com
Subject: Re: Warning from unlock_new_inode
Date: Wed, 29 Feb 2012 21:03:24 -0600 [thread overview]
Message-ID: <20120301030324.GX28391@sgi.com> (raw)
In-Reply-To: <20120229014906.GX3592@dastard>
Hi Dave,
Eric mentioned a really excellent bugfix earlier. This must be it.
On Wed, Feb 29, 2012 at 12:49:06PM +1100, Dave Chinner wrote:
> xfs: fix inode lookup race
>
> From: Dave Chinner <dchinner@redhat.com>
>
> When we get concurrent lookups of the same inode that is not in the
> per-AG inode cache, there is a race condition that triggers warnings
> in unlock_new_inode() indicating that we are initialising an inode
> that isn't in a the correct state for a new inode.
>
> When we do an inode lookup via a file handle or a bulkstat, we don't
> serialise lookups at a higher level through the dentry cache (i.e.
> pathless lookup), and so we can get concurrent lookups of the same
> inode.
>
> The race condition is between the insertion of the inode into the
> cache in the case of a cache miss and a concurrently lookup:
>
> Thread 1 Thread 2
> xfs_iget()
> xfs_iget_cache_miss()
> xfs_iread()
> lock radix tree
> radix_tree_insert()
> rcu_read_lock
> radix_tree_lookup
> lock inode flags
> XFS_INEW not set
> igrab()
> unlock inode flags
> rcu_read_unlock
> use uninitialised inode
> .....
> lock inode flags
> set XFS_INEW
> unlock inode flags
> unlock radix tree
> xfs_setup_inode()
> inode flags = I_NEW
> unlock_new_inode()
> WARNING as inode flags != I_NEW
>
> This can lead to inode corruption, inode list corruption, etc, and
> is generally a bad thing to occur.
>
> Fix this by setting XFS_INEW before inserting the inode into the
> radix tree. This will ensure any concurrent lookup will find the new
> inode with XFS_INEW set and that forces the lookup to wait until the
> XFS_INEW flag is removed before allowing the lookup to succeed.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_iget.c | 17 +++++++++++------
> 1 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
> index 05bed2b..2467ab7 100644
> --- a/fs/xfs/xfs_iget.c
> +++ b/fs/xfs/xfs_iget.c
> @@ -350,9 +350,19 @@ xfs_iget_cache_miss(
> BUG();
> }
>
> - spin_lock(&pag->pag_ici_lock);
> + /* These values _must_ be set before inserting the inode into the radix
> + * tree as the moment it is inserted a concurrent lookup (allowed by the
> + * RCU locking mechanism) can find it and that lookup must see that this
> + * is an inode currently under construction (i.e. that XFS_INEW is set).
> + * The ip->i_flags_lock that protects the XFS_INEW flag forms the
> + * memory barrier that ensures this detection works correctly at lookup
> + * time.
> + */
> + xfs_iflags_set(ip, XFS_INEW);
> + ip->i_udquot = ip->i_gdquot = NULL;
>
> /* insert the new inode */
> + spin_lock(&pag->pag_ici_lock);
> error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
> if (unlikely(error)) {
> WARN_ON(error != -EEXIST);
> @@ -360,11 +370,6 @@ xfs_iget_cache_miss(
> error = EAGAIN;
> goto out_preload_end;
> }
> -
> - /* These values _must_ be set before releasing the radix tree lock! */
^^^
So, in this comment 'radix tree lock' refers to pag->pag_ici_lock?
And, pag_ici_lock lock provides no exclusion with radix_tree_lookup.
I believe I understand. That isn't to say that I couldn't use a
brush-up on RCU. Awesome. ;)
Reviewed-by: Ben Myers <bpm@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-03-01 3:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-22 22:01 Warning from unlock_new_inode Jan Kara
2012-02-28 8:34 ` Christoph Hellwig
2012-02-28 10:11 ` Jan Kara
2012-02-29 0:53 ` Dave Chinner
2012-02-29 1:49 ` Dave Chinner
2012-02-29 9:03 ` Christoph Hellwig
2012-02-29 10:24 ` Jan Kara
2012-03-01 3:03 ` Ben Myers [this message]
2012-03-01 3:58 ` Dave Chinner
2012-03-01 5:09 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120301030324.GX28391@sgi.com \
--to=bpm@sgi.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.