linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-fsdevel@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Jorge Boncompte [DTI2]" <jorge@dti2.net>,
	Adrian Hunter <ext-adrian.hunter@nokia.com>,
	stable@kernel.org
Subject: Re: [patch] fs: avoid I_NEW inodes
Date: Tue, 10 Mar 2009 17:03:21 +0100	[thread overview]
Message-ID: <20090310160321.GB1190@duck.suse.cz> (raw)
In-Reply-To: <20090310134106.GA15977@wotan.suse.de>

  Hi,

On Tue 10-03-09 14:41:06, Nick Piggin wrote:
> On Thu, Mar 05, 2009 at 12:12:26PM +0100, Jan Kara wrote:
> > On Thu 05-03-09 11:16:37, Nick Piggin wrote:
> > > On Thu, Mar 05, 2009 at 11:00:01AM +0100, Jan Kara wrote:
> > > > On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> > > > > after ~1hour of running. Previously, the new warnings would start immediately
> > > > > and hang would happen in under 5 minutes.
> > > >   A quick grep seems to indicate that you've still missed a few cases,
> > > > haven't you? I still see the same problem in
> > > > drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
> > > > scanning, and dquot.c:add_dquot_ref() scanning.
> > > >   Otherwise the patch looks fine.
> > > 
> > > I thought they should be OK; drop_pagecache_sb doesn't play with flags,
> > > invalidate_inodes won't if refcount is elevated, and I think add_dquot_ref
> > > won't if writecount is not elevated...
> >   Ah, ok, you are probably right.
> > 
> > > But maybe that's  abit fragile and it would be better policy to always
> > > skip I_NEW in these traverals?
> >   Yes, it seems too fragile to me. I'm not saying we have to forbid
> > everything for I_NEW inodes but I think we should set clear simple rules
> > what is protected by I_NEW and then verify that all sites which can come
> > across such inodes obey them.
> 
> OK, sorry for the delay, what do you think of the following patch on top
> of the last?
  Thanks for the patch. I have a few comments. See below.

> ---
> 
> To be on the safe side, it should be less fragile to exclude I_NEW inodes
> from inode list scans by default (unless there is an important reason to
> have them).
> 
> Normally they will get excluded (eg. by zero refcount or writecount etc),
> however it is a bit fragile for list walkers to know exactly what parts of
> the inode state is set up and valid to test when in I_NEW. So along these
> lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
> checks with them too -- this shouldn't be a problem should it?)
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> 
> ---
>  fs/dquot.c                  |    6 ++++--
>  fs/drop_caches.c            |    2 +-
>  fs/inode.c                  |    2 ++
>  fs/notify/inotify/inotify.c |   16 ++++++++--------
>  4 files changed, 15 insertions(+), 11 deletions(-)
> 
> Index: linux-2.6/fs/dquot.c
> ===================================================================
> --- linux-2.6.orig/fs/dquot.c
> +++ linux-2.6/fs/dquot.c
> @@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
>  		if (!atomic_read(&inode->i_writecount))
>  			continue;
>  		if (!dqinit_needed(inode, type))
>  			continue;
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> -			continue;
>  
>  		__iget(inode);
>  		spin_unlock(&inode_lock);
> @@ -870,6 +870,8 @@ static void remove_dquot_ref(struct supe
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_state & I_NEW)
> +			continue;
>  		if (!IS_NOQUOTA(inode))
>  			remove_inode_dquot_ref(inode, type, tofree_head);
>  	}
  Hmm, in this scan, we have to scan also I_NEW inodes because they can
already have quota pointers initialized and so we could leave some dangling
quota references if we skipped I_NEW inodes. Nasty. So just add a comment
here like this one here:
/*
 *  We have to scan also I_NEW inodes because they can already have quota
 *  pointer initialized. Luckily, we need to touch only quota pointers and
 *  these have separate locking (dqptr_sem).
 */

> Index: linux-2.6/fs/drop_caches.c
> ===================================================================
> --- linux-2.6.orig/fs/drop_caches.c
> +++ linux-2.6/fs/drop_caches.c
> @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
>  			continue;
>  		if (inode->i_mapping->nrpages == 0)
>  			continue;
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c
> +++ linux-2.6/fs/inode.c
> @@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
>  		if (tmp == head)
>  			break;
>  		inode = list_entry(tmp, struct inode, i_sb_list);
> +		if (inode->i_state & I_NEW)
> +			continue;
  If somebody is setting up inodes at this point, we are in serious
trouble I think. So WARN_ON would be more appropriate I think.

>  		invalidate_inode_buffers(inode);
>  		if (!atomic_read(&inode->i_count)) {
>  			list_move(&inode->i_list, dispose);
> Index: linux-2.6/fs/notify/inotify/inotify.c
> ===================================================================
> --- linux-2.6.orig/fs/notify/inotify/inotify.c
> +++ linux-2.6/fs/notify/inotify/inotify.c
> @@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
>  		struct list_head *watches;
>  
>  		/*
> +		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> +		 * I_WILL_FREE which is fine because by that point the inode
> +		 * cannot have any associated watches.
> +		 */
  Update the comment?

> +		if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
> +
> +		/*
>  		 * If i_count is zero, the inode cannot have any watches and
>  		 * doing an __iget/iput with MS_ACTIVE clear would actually
>  		 * evict all inodes with zero i_count from icache which is
> @@ -388,14 +396,6 @@ void inotify_unmount_inodes(struct list_
>  		if (!atomic_read(&inode->i_count))
>  			continue;
>  
> -		/*
> -		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> -		 * I_WILL_FREE which is fine because by that point the inode
> -		 * cannot have any associated watches.
> -		 */
> -		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
> -			continue;
> -
>  		need_iput_tmp = need_iput;
>  		need_iput = NULL;
>  		/* In case inotify_remove_watch_locked() drops a reference. */

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2009-03-10 16:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-05  6:45 [patch] fs: new inode i_state corruption fix Nick Piggin
2009-03-05 10:00 ` Jan Kara
2009-03-05 10:16   ` Nick Piggin
2009-03-05 11:12     ` Jan Kara
2009-03-10 13:41       ` [patch] fs: avoid I_NEW inodes Nick Piggin
2009-03-10 16:03         ` Jan Kara [this message]
2009-03-11  2:34           ` Nick Piggin
2009-03-11 12:22             ` Jan Kara
2009-03-11  3:29           ` Nick Piggin
2009-03-11 12:24             ` Jan Kara
2009-03-11 12:57               ` Nick Piggin
2009-03-11 20:19                 ` Andrew Morton
2009-03-12  3:09                   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090310160321.GB1190@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=ext-adrian.hunter@nokia.com \
    --cc=jorge@dti2.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).