From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Eric Sandeen <sandeen@sandeen.net>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
Nick Piggin <npiggin@suse.de>, Jeff Layton <jlayton@redhat.com>
Subject: Re: [PATCH] skip I_CLEAR state inodes
Date: Wed, 3 Jun 2009 22:47:11 +0800 [thread overview]
Message-ID: <20090603144711.GC5738@localhost> (raw)
In-Reply-To: <20090603141636.GC5650@duck.suse.cz>
On Wed, Jun 03, 2009 at 10:16:36PM +0800, Jan Kara wrote:
> On Wed 03-06-09 22:10:21, Wu Fengguang wrote:
> > On Tue, Jun 02, 2009 at 07:37:36PM +0800, Jan Kara wrote:
> > > On Tue 02-06-09 16:55:23, Wu Fengguang wrote:
> > > > On Tue, Jun 02, 2009 at 05:38:35AM +0800, Eric Sandeen wrote:
> > > > > Wu Fengguang wrote:
> > > > > > Add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and
> > > > > > add_dquot_ref().
> > > > > >
> > > > > > clear_inode() will switch inode state from I_FREEING to I_CLEAR,
> > > > > > and do so _outside_ of inode_lock. So any I_FREEING testing is
> > > > > > incomplete without the testing of I_CLEAR.
> > > > > >
> > > > > > Masayoshi MIZUMA first discovered the bug in drop_pagecache_sb() and
> > > > > > Jan Kara reminds fixing the other two cases. Thanks!
> > > > >
> > > > > Is there a reason it's not done for __sync_single_inode as well?
> > > >
> > > > It missed the glance because it don't have an obvious '|' in the line ;)
> > > >
> > > > > Jeff Layton asked the question and I'm following it up :)
> > > > >
> > > > > __sync_single_inode currently only tests I_FREEING, but I think we are
> > > > > safe because __sync_single_inode sets I_SYNC, and clear_inode waits for
> > > > > I_SYNC to be cleared before it changes I_STATE.
> > > >
> > > > But I_SYNC is removed just before the I_FREEING test, so we still have
> > > > a small race window?
> > > >
> > > > > On the other hand, testing I_CLEAR here probably would be safe anyway,
> > > > > and it'd be bonus points for consistency?
> > > >
> > > > So let's add the I_CLEAR test?
> > > >
> > > > > Same basic question for generic_sync_sb_inodes, which has a
> > > > > BUG_ON(inode->i_state & I_FREEING), seems like this could check I_CLWAR
> > > > > as well?
> > > >
> > > > Yes, we can add I_CLEAR here to catch more error condition.
> > > >
> > > > Thanks,
> > > > Fengguang
> > > >
> > > > ---
> > > > skip I_CLEAR state inodes in writeback routines
> > > >
> > > > The I_FREEING test in __sync_single_inode() is racy because
> > > > clear_inode() can set i_state to I_CLEAR between the clear of I_SYNC
> > > > and the test of I_FREEING.
> > > >
> > > > Also extend the coverage of BUG_ON(I_FREEING) to I_CLEAR.
> > > >
> > > > Reported-by: Jeff Layton <jlayton@redhat.com>
> > > > Reported-by: Eric Sandeen <sandeen@sandeen.net>
> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > ---
> > > > fs/fs-writeback.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > --- linux.orig/fs/fs-writeback.c
> > > > +++ linux/fs/fs-writeback.c
> > > > @@ -316,7 +316,7 @@ __sync_single_inode(struct inode *inode,
> > > > spin_lock(&inode_lock);
> > > > WARN_ON(inode->i_state & I_NEW);
> > > > inode->i_state &= ~I_SYNC;
> > > > - if (!(inode->i_state & I_FREEING)) {
> > > > + if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
> > > > if (!(inode->i_state & I_DIRTY) &&
> > > > mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> > > Is the whole if needed? I had an impression that everyone calling
> > > __sync_single_inode() should better take care it does not race with inode
> > > freeing... So WARN_ON would be more appropriate IMHO.
> > >
> > > > /*
> > > > @@ -518,7 +518,7 @@ void generic_sync_sb_inodes(struct super
> > > > if (current_is_pdflush() && !writeback_acquire(bdi))
> > > > break;
> > > >
> > > > - BUG_ON(inode->i_state & I_FREEING);
> > > > + BUG_ON(inode->i_state & (I_FREEING | I_CLEAR));
> > > > __iget(inode);
> > > > pages_skipped = wbc->pages_skipped;
> > > > __writeback_single_inode(inode, wbc);
> > > Looking at this code, it looks a bit suspicious. What prevents this s_io
> > > list scan to race with inode freeing? In particular generic_forget_inode()
> >
> > Good catch.
> >
> > > can drop inode_lock to write the inode and in the mean time
> > > generic_sync_sb_inodes() can come, get a reference to the inode and start
> > > it's writeback... Subsequent iput() would then call generic_forget_inode()
> >
> > Another possibility:
> >
> > generic_forget_inode
> > inode->i_state |= I_WILL_FREE;
> > spin_unlock(&inode_lock);
> > generic_sync_sb_inodes()
> > spin_lock(&inode_lock);
> > __iget(inode);
> > __writeback_single_inode
> > // see non zero i_count
> > WARN_ON(inode->i_state & I_WILL_FREE);
> >
> > I'm wondering why didn't we saw reports on the last WARN_ON()?
> > Did we missed something?
> I meant the above race in my description ;-). Anyway, the race can happen
> only if we are unmounting the filesystem (normally, we bail out on
> sb->s_flags & MS_ACTIVE check - yes, it's a bit hidden and it also took me
> a while to understand why we weren't seeing tons of warnings...).
Ah OK. Just checked that all three callers of generic_sync_sb_inodes():
- writeback_inodes(): umount prevented
- pohmelfs_kill_super(): just before umount
- ubifs calls: too complex to be obvious..
At least the first two cases are safe, so we didn't see the error report ;)
> > > on the inode again. So shouldn't we skip I_FREEING|I_CLEAR|I_WILL_FREE|I_NEW
> > > inodes in this scan like we do for later in the function for another scan?
Yes we should do this at least for safety. I_WILL_FREE means
generic_forget_inode() is going to writeback the inode on its own, so
generic_sync_sb_inodes() would better not to wade in.
Thanks,
Fengguang
next prev parent reply other threads:[~2009-06-03 14:47 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-18 8:13 [PATCH][BUG] Lack of mutex_lock in drop_pagecache_sb() Masasyoshi MIZUMA
2009-03-23 10:38 ` Wu Fengguang
2009-03-24 7:06 ` Masayoshi MIZUMA
2009-03-24 7:44 ` Wu Fengguang
2009-03-24 12:05 ` Jan Kara
2009-03-24 12:11 ` Wu Fengguang
2009-03-24 12:40 ` [PATCH] skip I_CLEAR state inodes Wu Fengguang
2009-03-30 7:18 ` [PATCH][RESEND for 2.6.29-rc8-mm1] " Wu Fengguang
2009-03-31 23:43 ` Andrew Morton
2009-04-01 0:53 ` Wu Fengguang
2009-06-01 21:38 ` [PATCH] " Eric Sandeen
2009-06-02 8:55 ` Wu Fengguang
2009-06-02 10:27 ` Jeff Layton
2009-06-02 11:37 ` Jan Kara
2009-06-02 21:48 ` Eric Sandeen
2009-06-03 10:45 ` Jeff Layton
2009-06-03 13:32 ` Wu Fengguang
2009-06-03 14:00 ` Jan Kara
2009-06-03 14:10 ` Wu Fengguang
2009-06-03 14:16 ` Jan Kara
2009-06-03 14:47 ` Wu Fengguang [this message]
2009-06-06 3:07 ` [PATCH] writeback: skip new or to-be-freed inodes Wu Fengguang
2009-06-08 7:03 ` Artem Bityutskiy
2009-06-08 7:03 ` Artem Bityutskiy
2009-06-08 9:29 ` Wu Fengguang
2009-06-08 10:45 ` Christoph Hellwig
2009-06-09 7:24 ` Artem Bityutskiy
2009-06-09 7:24 ` Artem Bityutskiy
2009-06-09 7:03 ` Artem Bityutskiy
2009-06-09 7:03 ` Artem Bityutskiy
2009-06-08 17:07 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090603144711.GC5738@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.mizuma@jp.fujitsu.com \
--cc=npiggin@suse.de \
--cc=sandeen@sandeen.net \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.