From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
xfs@oss.sgi.com
Subject: Re: Warning from unlock_new_inode
Date: Wed, 29 Feb 2012 21:03:24 -0600 [thread overview]
Message-ID: <20120301030324.GX28391@sgi.com> (raw)
In-Reply-To: <20120229014906.GX3592@dastard>
Hi Dave,
Eric mentioned a really excellent bugfix earlier. This must be it.
On Wed, Feb 29, 2012 at 12:49:06PM +1100, Dave Chinner wrote:
> xfs: fix inode lookup race
>
> From: Dave Chinner <dchinner@redhat.com>
>
> When we get concurrent lookups of the same inode that is not in the
> per-AG inode cache, there is a race condition that triggers warnings
> in unlock_new_inode() indicating that we are initialising an inode
> that isn't in a the correct state for a new inode.
>
> When we do an inode lookup via a file handle or a bulkstat, we don't
> serialise lookups at a higher level through the dentry cache (i.e.
> pathless lookup), and so we can get concurrent lookups of the same
> inode.
>
> The race condition is between the insertion of the inode into the
> cache in the case of a cache miss and a concurrently lookup:
>
> Thread 1 Thread 2
> xfs_iget()
> xfs_iget_cache_miss()
> xfs_iread()
> lock radix tree
> radix_tree_insert()
> rcu_read_lock
> radix_tree_lookup
> lock inode flags
> XFS_INEW not set
> igrab()
> unlock inode flags
> rcu_read_unlock
> use uninitialised inode
> .....
> lock inode flags
> set XFS_INEW
> unlock inode flags
> unlock radix tree
> xfs_setup_inode()
> inode flags = I_NEW
> unlock_new_inode()
> WARNING as inode flags != I_NEW
>
> This can lead to inode corruption, inode list corruption, etc, and
> is generally a bad thing to occur.
>
> Fix this by setting XFS_INEW before inserting the inode into the
> radix tree. This will ensure any concurrent lookup will find the new
> inode with XFS_INEW set and that forces the lookup to wait until the
> XFS_INEW flag is removed before allowing the lookup to succeed.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_iget.c | 17 +++++++++++------
> 1 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
> index 05bed2b..2467ab7 100644
> --- a/fs/xfs/xfs_iget.c
> +++ b/fs/xfs/xfs_iget.c
> @@ -350,9 +350,19 @@ xfs_iget_cache_miss(
> BUG();
> }
>
> - spin_lock(&pag->pag_ici_lock);
> + /* These values _must_ be set before inserting the inode into the radix
> + * tree as the moment it is inserted a concurrent lookup (allowed by the
> + * RCU locking mechanism) can find it and that lookup must see that this
> + * is an inode currently under construction (i.e. that XFS_INEW is set).
> + * The ip->i_flags_lock that protects the XFS_INEW flag forms the
> + * memory barrier that ensures this detection works correctly at lookup
> + * time.
> + */
> + xfs_iflags_set(ip, XFS_INEW);
> + ip->i_udquot = ip->i_gdquot = NULL;
>
> /* insert the new inode */
> + spin_lock(&pag->pag_ici_lock);
> error = radix_tree_insert(&pag->pag_ici_root, agino, ip);
> if (unlikely(error)) {
> WARN_ON(error != -EEXIST);
> @@ -360,11 +370,6 @@ xfs_iget_cache_miss(
> error = EAGAIN;
> goto out_preload_end;
> }
> -
> - /* These values _must_ be set before releasing the radix tree lock! */
^^^
So, in this comment 'radix tree lock' refers to pag->pag_ici_lock?
And, pag_ici_lock lock provides no exclusion with radix_tree_lookup.
I believe I understand. That isn't to say that I couldn't use a
brush-up on RCU. Awesome. ;)
Reviewed-by: Ben Myers <bpm@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-03-01 3:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-22 22:01 Warning from unlock_new_inode Jan Kara
2012-02-28 8:34 ` Christoph Hellwig
2012-02-28 10:11 ` Jan Kara
2012-02-29 0:53 ` Dave Chinner
2012-02-29 1:49 ` Dave Chinner
2012-02-29 9:03 ` Christoph Hellwig
2012-02-29 10:24 ` Jan Kara
2012-03-01 3:03 ` Ben Myers [this message]
2012-03-01 3:58 ` Dave Chinner
2012-03-01 5:09 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120301030324.GX28391@sgi.com \
--to=bpm@sgi.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox