From: Mingming <cmm@us.ibm.com>
To: Jan Kara <jack@suse.cz>
Cc: Theodore Tso <tytso@mit.edu>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
linux-ext4@vger.kernel.org
Subject: Re: fsync on ext[34] working only by an accident
Date: Thu, 10 Sep 2009 13:14:46 -0700 [thread overview]
Message-ID: <1252613686.7006.5.camel@mingming-laptop> (raw)
In-Reply-To: <20090910140636.GJ607@duck.suse.cz>
On Thu, 2009-09-10 at 16:06 +0200, Jan Kara wrote:
> On Thu 10-09-09 09:10:07, Theodore Tso wrote:
> > On Thu, Sep 10, 2009 at 04:34:55PM +0530, Aneesh Kumar K.V wrote:
> > > mark_buffer_dirty -> __set_page_dirty -> __mark_inode_dirty
> >
> > We need to be careful here. First of all, mark_buffer_dirty() on the
> > code paths you are talking about is being passed a metadata buffer
> > head. As such, has Jan has pointed out, the bh is part of the buffer
> > cache, so the page->mapping of associated with bh->b_page is the inode
> > of the block device --- *not* the ext4 inode.
> >
> > Secondly, __set_page_dirty calls __mark_inode_dirty passing in
> > I_DIRTY_PAGES --- which should be a hint. What Jan is talking about
> > is where we set the inode flags I_DIRTY_SYNC and I_DIRTY_DATASYNC:
> >
> > * I_DIRTY_SYNC Inode is dirty, but doesn't have to be written on
> > * fdatasync(). i_atime is the usual cause.
> > * I_DIRTY_DATASYNC Data-related inode changes pending. We keep track of
> > * these changes separately from I_DIRTY_SYNC so that we
> > * don't have to write inode on fdatasync() when only
> > * mtime has changed in it.
> >
> > This is important because ext4_sync_file() (which is called by fsync()
> > and fdatasync()) uses this logic to determine whether or not to call
> > sync_inode(), which is what will force a commit when wbc.sync_mode is
> > set to WB_SYNC_ALL.
> Yes, this is exactly what I was trying to point out.
>
> > In fact, I think the problem is worse than Jan is pointing out,
> > because it's not enough that vfs_fq_alloc_space() is calling
> > mark_inode_dirty(), since that only sets I_DIRTY_SYNC. When we touch
> > i_size or i_block[], we need to make sure that I_DIRTY_DATASYNC is
> > set, so that fdatasync() will force a commit.
> Actually no. mark_inode_dirty() dirties inode with I_DIRTY ==
> (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES) so it happens to work.
> The fact that quota *could* dirty the inode with I_DIRTY_SYNC only
> is right but that's a separate issue.
>
> > I think the right thing to do is to create an
> > _ext[34]_mark_inode_dirty() which takes an extra argument, which
> > controls whether or not we set I_DIRTY_SYNC or I_DIRTY_DATASYNC. In
> > fact, most of the time, we want to be setting I_DIRTY_DATASYNC, so we
> > should probably have ext[34]_mark_inode_dirty() call
> > _ext[34]_mark_inode_dirty() with I_DIRTY_DATASYNC, and then create a
> > ext[34]_mark_inode_nodatasync() version passes in I_DIRTY_SYNC.
> >
> > This will cause pdflush to call ext4_write_inode() more frequently,
> > but pdflush calls write_inode with wait=0, and ext4_write_inode() is a
> > no-op in that case.
> Thinking about it, it won't work so easily. The problem is that when
> pdflush decides to write the inode, it unconditionally clears dirty flags.
> We could redirty the inode in write_inode() but that's IMHO too ugly to
> bear it.
I am a little confused here, so pdflush could found the dirty inodes
(due to quota) but it doesn't force journal comminit and write the inode
to disk?
Mingming
next prev parent reply other threads:[~2009-09-10 20:14 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-08 13:26 fsync on ext[34] working only by an accident Jan Kara
2009-09-10 6:46 ` Aneesh Kumar K.V
2009-09-10 8:50 ` Jan Kara
2009-09-10 9:04 ` Aneesh Kumar K.V
2009-09-10 9:15 ` Jan Kara
2009-09-10 9:15 ` Aneesh Kumar K.V
2009-09-10 10:52 ` Jan Kara
2009-09-10 11:04 ` Aneesh Kumar K.V
2009-09-10 12:32 ` Jan Kara
2009-09-10 13:10 ` Theodore Tso
2009-09-10 14:06 ` Jan Kara
2009-09-10 16:52 ` Theodore Tso
2009-09-14 16:00 ` Jan Kara
2009-09-10 20:14 ` Mingming [this message]
2009-09-14 15:25 ` Jan Kara
2009-09-10 16:25 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1252613686.7006.5.camel@mingming-laptop \
--to=cmm@us.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).