From: Nick Piggin <npiggin@kernel.dk>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Nick Piggin <npiggin@kernel.dk>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-ext4@vger.kernel.org, Roman Zippel <zippel@linux-m68k.org>,
"Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk>,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
Dave Kleikamp <shaggy@linux.vnet.ibm.com>,
Bob Copeland <me@bobcopeland.com>,
reiserfs-devel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Evgeniy Dushistov <dushistov@mail.ru>, Jan Kara <jack@suse.cz>
Subject: Re: [RFC][PATCH] Possible data integrity problems in lots of filesystems?
Date: Thu, 25 Nov 2010 22:47:11 +1100 [thread overview]
Message-ID: <20101125114711.GA3622@amd> (raw)
In-Reply-To: <4CEE3F9F.9070108@panasas.com>
On Thu, Nov 25, 2010 at 12:51:11PM +0200, Boaz Harrosh wrote:
> On 11/25/2010 12:06 PM, Nick Piggin wrote:
> > On Thu, Nov 25, 2010 at 11:28:14AM +0200, Boaz Harrosh wrote:
>
> >>> Index: linux-2.6/fs/exofs/file.c
> >>> ===================================================================
> >>> --- linux-2.6.orig/fs/exofs/file.c 2010-11-19 16:50:00.000000000 +1100
> >>> +++ linux-2.6/fs/exofs/file.c 2010-11-19 16:50:07.000000000 +1100
> >>> @@ -48,11 +48,6 @@ static int exofs_file_fsync(struct file
> >>> struct inode *inode = filp->f_mapping->host;
> >>> struct super_block *sb;
> >>>
> >>> - if (!(inode->i_state & I_DIRTY))
> >>> - return 0;
> >>> - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
> >>> - return 0;
> >>> -
> >>> ret = sync_inode_metadata(inode, 1);
> >>>
> >>> /* This is a good place to write the sb */
> >>>
> >>
> >> Is that a good enough fix for the issue in your opinion?
> >> Or is there more involved?
> >
> > For the inode dirty bit race problem, yes it should fix it.
> > sync_inode_metadata basically makes the same checks without
> > races (in a subsequent patch I re-introduced the datasync
> > optimisation).
> >
> >
>
> >
> > Well in your fsync, you need to wait for inode writeback
> > that might have been started by an asynchronous write_inode.
> >
>
> All I'm calling is sync_inode_metadata(,1) which calls sync_inode()
> which calls writeback_single_inode(sync_mode == WB_SYNC_ALL). It gets
> a little complicated but from the looks of it, even though the
> call to .write_inode() is not under any lock the state machine there
> will do inode_wait_for_writeback() if there was one in motion
> all ready. ?
>
> And it looks like writeback_single_inode() does all the proper
> checks in the correct order for these flags above.
>
> So current code in exofs_file_fsync() looks scary to me. I would
> like to push your above patch for this Kernel. (I'll repost it)
It does not get it right, because of the situation I described
above. Background writeout can come in first, and clear the inode
dirty bits, and call your ->write_inode for async writeout.
That means you skip doing the exofs_put_io_state(), and (I presume)
this means you aren't waiting for write completion there.
What then happens is that sync_inode_metadata() from your fsync
does not call ->write_inode because the inode dirty bits are clear.
It's basically a noop. So you need to either make your .write_inode
always synchronous, or wait for it in your .fsync and .sync_fs.
> > Also, with your sync_inode_metadata call, you shouldn't need the
> > sync_inode call by the looks.
> >
>
> What? I missed you. You mean I don't need to sync_inode_metadata(,wait==1),
> or what did you mean?
Sorry, I was looking at the wrong code, ignore that.
Nick
next prev parent reply other threads:[~2010-11-25 11:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-25 7:49 [RFC][PATCH] Possible data integrity problems in lots of filesystems? Nick Piggin
2010-11-25 9:28 ` Boaz Harrosh
2010-11-25 10:06 ` Nick Piggin
2010-11-25 10:51 ` Boaz Harrosh
2010-11-25 10:52 ` [PATCH] exofs: simple fsync race fix Boaz Harrosh
2010-11-25 11:50 ` Nick Piggin
2011-02-03 11:44 ` Boaz Harrosh
2010-11-25 11:47 ` Nick Piggin [this message]
2010-11-25 12:18 ` [RFC][PATCH] Possible data integrity problems in lots of filesystems? Boaz Harrosh
2010-11-25 11:54 ` Nick Piggin
2010-11-25 12:01 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101125114711.GA3622@amd \
--to=npiggin@kernel.dk \
--cc=bharrosh@panasas.com \
--cc=dushistov@mail.ru \
--cc=hch@infradead.org \
--cc=hirofumi@mail.parknet.co.jp \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=me@bobcopeland.com \
--cc=reiserfs-devel@vger.kernel.org \
--cc=shaggy@linux.vnet.ibm.com \
--cc=tigran@aivazian.fsnet.co.uk \
--cc=zippel@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).