From: Nick Piggin <npiggin@kernel.dk>
To: Dave Chinner <david@fromorbit.com>
Cc: npiggin@kernel.dk, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [patch 7/7] fs: fix or note I_DIRTY handling bugs in filesystems
Date: Wed, 24 Nov 2010 11:23:05 +1100 [thread overview]
Message-ID: <20101124002305.GA3168@amd> (raw)
In-Reply-To: <20101123225148.GZ22876@dastard>
On Wed, Nov 24, 2010 at 09:51:48AM +1100, Dave Chinner wrote:
> On Wed, Nov 24, 2010 at 01:06:17AM +1100, npiggin@kernel.dk wrote:
> > Comments?
>
> How did you test the changes?
Not widely as yet, just tested a few filesystems passed deadlock and
bug tests. It's just in RFC state as yet.
> > +++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c 2010-11-24 00:08:03.000000000 +1100
> > @@ -99,6 +99,7 @@ xfs_file_fsync(
> > struct xfs_trans *tp;
> > int error = 0;
> > int log_flushed = 0;
> > + unsigned dirty, mask;
> >
> > trace_xfs_file_fsync(ip);
> >
> > @@ -132,9 +133,16 @@ xfs_file_fsync(
> > * might gets cleared when the inode gets written out via the AIL
> > * or xfs_iflush_cluster.
> > */
> > - if (((inode->i_state & I_DIRTY_DATASYNC) ||
> > - ((inode->i_state & I_DIRTY_SYNC) && !datasync)) &&
> > - ip->i_update_core) {
> > + spin_lock(&inode_lock);
> > + inode_writeback_begin(inode, 1);
> > + if (datasync)
> > + mask = I_DIRTY_DATASYNC;
> > + else
> > + mask = I_DIRTY_SYNC | I_DIRTY_DATASYNC;
> > + dirty = inode->i_state & mask;
> > + inode->i_state &= ~mask;
> > + spin_unlock(&inode_lock);
> > + if (dirty && ip->i_update_core) {
>
> It looks to me like the pattern "inode_writeback_begin(); get dirty
> state from i_state" repeated for each filesystem is wrong. The
> inode_writeback_begin() helper does this:
>
> inode->i_state &= ~I_DIRTY;
>
> which clears all the dirty bits from the i_state, which means the
> followup:
>
> dirty = inode->i_state & mask;
>
> will always result in a zero value for dirty. IOWs, this seems to
> ensure that ->fsync never sees dirty inodes anymore. This will break
> fsync on XFS, and probably on all the other filesystems you modified
> to use this pattern as well.
Yes, the helper needs to do inode->i_state &= ~I_DIRTY_PAGES. Good
catch, thanks.
I had I_DIRTY there because I was initially going to return the
dirty bits, however some cases want to check/clear bits at different
times (eg. background writeout wants to clear DIRTY_PAGES then do
the pagecache writeback, and then test/clear the metadata dirty bits).
> Also, I think the pattern is racy with respect to concurrent page
> cache dirtiers. i.e if the inode was dirtied between writeback and
> ->fsync() in vfs_fsync_range(), then this new code clears the
> I_DIRTY_PAGES bit in i_state without writing back the dirty pages.
That gets caught in the writeback_end helper, same way as for background
writeout. It's useful to do this for the fsync helper so that the inode
actually gets marked clean if the pagecache writeback cleaned
everything.
>
> And FWIW, I'm not sure that we want to be propagating the inode_lock
> into every filesystem...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
prev parent reply other threads:[~2010-11-24 0:23 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-23 14:06 [patch 0/7] icache dirty / sync fixes npiggin
2010-11-23 14:06 ` [patch 1/7] fs: mark_inode_dirty barrier fix npiggin
2010-11-23 14:06 ` [patch 2/7] fs: simple fsync race fix npiggin
2010-11-29 14:58 ` Christoph Hellwig
2010-11-30 0:05 ` Nick Piggin
2010-11-23 14:06 ` [patch 3/7] fs: introduce inode writeback helpers npiggin
2010-11-29 15:13 ` Christoph Hellwig
2010-11-30 0:22 ` Nick Piggin
2010-11-23 14:06 ` [patch 4/7] fs: preserve inode dirty bits on failed metadata writeback npiggin
2010-11-29 14:59 ` Christoph Hellwig
2010-11-30 0:08 ` Nick Piggin
2010-11-23 14:06 ` [patch 5/7] fs: ext2 inode sync fix npiggin
2010-11-30 11:26 ` Boaz Harrosh
2010-11-23 14:06 ` [patch 6/7] fs: fsync optimisations npiggin
2010-11-29 15:03 ` Christoph Hellwig
2010-11-30 0:11 ` Nick Piggin
2010-11-23 14:06 ` [patch 7/7] fs: fix or note I_DIRTY handling bugs in filesystems npiggin
2010-11-23 15:04 ` Steven Whitehouse
2010-11-23 22:51 ` Dave Chinner
2010-11-24 0:23 ` Nick Piggin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101124002305.GA3168@amd \
--to=npiggin@kernel.dk \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).