All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] GFS2: dirty inode correctly in gfs2_write_end
Date: Thu, 05 Sep 2013 09:28:09 +0100	[thread overview]
Message-ID: <1378369689.2698.1.camel@menhir> (raw)
In-Reply-To: <20130905034835.GA16256@dhcp80-209.msp.redhat.com>

Hi,

On Wed, 2013-09-04 at 22:48 -0500, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2013 at 10:36:16PM +0100, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Wed, 2013-09-04 at 11:59 -0500, Benjamin Marzinski wrote:
> > > On Wed, Sep 04, 2013 at 03:55:03PM +0100, Steven Whitehouse wrote:
> > > > Hi,
> > > > 
> > > > On Tue, 2013-09-03 at 16:59 -0500, Benjamin Marzinski wrote:
> > > > > GFS2 was only setting I_DIRTY_DATASYNC on files that it wrote to, when
> > > > > it actually increased the file size.  If gfs2_fsync was called without
> > > > > I_DIRTY_DATASYNC set, it didn't flush the incore data to the log before
> > > > > returning, so any metadata or journaled data changes were not getting
> > > > > fsynced. This meant that writes to the middle of files were not always
> > > > > getting fsynced properly.
> > > > > 
> > > > > This patch makes gfs2 set I_DIRTY_DATASYNC whenever metadata has been
> > > > > updated during a write. It also make gfs2_sync flush the incore log
> > > > > if I_DIRTY_PAGES is set, and the file is using data journalling. This
> > > > > will make sure that all incore logged data gets written to disk before
> > > > > returning from a fsync.
> > > > > 
> > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > > ---
> > > > >  fs/gfs2/aops.c |    9 +++++++--
> > > > >  fs/gfs2/file.c |    4 +++-
> > > > >  2 files changed, 10 insertions(+), 3 deletions(-)
> > > > > 
> > > > > Index: gfs2-3.0-nmw-130830/fs/gfs2/aops.c
> > > > > ===================================================================
> > > > > --- gfs2-3.0-nmw-130830.orig/fs/gfs2/aops.c
> > > > > +++ gfs2-3.0-nmw-130830/fs/gfs2/aops.c
> > > > > @@ -815,6 +815,8 @@ static int gfs2_write_end(struct file *f
> > > > >  	unsigned int from = pos & (PAGE_CACHE_SIZE - 1);
> > > > >  	unsigned int to = from + len;
> > > > >  	int ret;
> > > > > +	struct gfs2_trans *tr = current->journal_info;
> > > > > +	BUG_ON(!tr);
> > > > >  
> > > > >  	BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
> > > > >  
> > > > > @@ -825,8 +827,6 @@ static int gfs2_write_end(struct file *f
> > > > >  		goto failed;
> > > > >  	}
> > > > >  
> > > > > -	gfs2_trans_add_meta(ip->i_gl, dibh);
> > > > > -
> > > > >  	if (gfs2_is_stuffed(ip))
> > > > >  		return gfs2_stuffed_write_end(inode, dibh, pos, len, copied, page);
> > > > >  
> > > > > @@ -834,6 +834,11 @@ static int gfs2_write_end(struct file *f
> > > > >  		gfs2_page_add_databufs(ip, page, from, to);
> > > > >  
> > > > >  	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
> > > > > +	if (tr->tr_num_buf_new)
> > > > > +		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> > > > > +	else
> > > > > +		gfs2_trans_add_meta(ip->i_gl, dibh);
> > > > > +
> > > > I'm not sure I follow this bit... generic_write_end() will call
> > > > mark_inode_dirty() if the file size has changed. So that case should be
> > > > covered. That leaves only timestamps which I think should be
> > > > I_DIRTY_SYNC since we don't need those for fdatasync. But why is there a
> > > > conditional here on tr_num_buf_new ? Does the "else" actually occur in
> > > > practice?
> > > 
> > > The idea is to only set I_DIRTY_DATASYNC if we've actually updated
> > > metadata because of the write.
> > 
> > I think we ought to be updating the metadata for every write (although
> > see case #3 below), since the timestamp should change each time. If the
> > size of the file doesn't change then we don't need to flush the metadata
> > on fdatasync, even though the timestamp has changed.
> > 
> > I think we have three cases:
> > 
> > 1. i_size changes: metadata must be flushed on fsync/fdatasync, so
> > I_DIRTY_DATASYNC should be set (done in generic_write_end())
> > 2. only timestamps change: metatdata must be flushed on fsync but not on
> > fdatasync, so I_DIRTY_SYNC should be set (I assume perhaps the
> > problematic case)
> > 3. write is of 0 bytes: I think we are allowed to skip all metadata
> > updates (need to check that we do this)
> 
> I wasn't counting timestamp changes, since like you mention, those use
> I_DIRTY_SYNC, and are ignored on fdatasync. What I care about is writes
> that cause block allocations.  The specific test in the bugzilla uses a
> holey file.  The writes that aren't getting synced ARE causing an
> allocation, but ARE NOT increasing i_size, since they are not at the end
> of the file.  This is a fourth case, and it's the case where we were
> falling down. I probably should have clarified that earlier.
> 
Ok, that makes perfect sense to me. Lets go with the patch "as is".
Sorry for not spotting that earlier.

> > 
> > >  Otherwise we flush the log for every
> > > write, even if we aren't data journalling and haven't changed any
> > > metadata.  If the write happened without adding any buffers to the
> > > transaction (which happens if we are just writing over already allocated
> > > space), then tr->tr_num_buf_new will be 0. This means we haven't
> > > actually dirtied any metadata, and we don't need to flush it to the log
> > > during the fsync.
> > > 
> > Does that happen for zero sized writes? Otherwise I'd be expecting us to
> > be updating time stamps at least on each write.
> 
> In rewriting already allocated data, we will update the timestamp and
> the inode will be added to the incore log, but we don't need to flush
> that change in gfs2_fsync, since we are asking for fdatasync. I thought
> the whole point of the seperation between I_DIRTY_SYNC and
> I_DIRTY_DATASYNC was so that you could avoid syncing inodes where only
> timestamps had changed. The change will still get to disk, but if we
> happen to crash before that, the mtime on the file might be wrong. Isn't
> that just the price you pay for doing a datasync instead of a full sync?
> 
Yes, that all sounds correct to me,

Steve.




  reply	other threads:[~2013-09-05  8:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03 21:59 [Cluster-devel] [PATCH] GFS2: dirty inode correctly in gfs2_write_end Benjamin Marzinski
2013-09-04 13:24 ` Bob Peterson
2013-09-04 14:55 ` Steven Whitehouse
2013-09-04 16:59   ` Benjamin Marzinski
2013-09-04 21:36     ` Steven Whitehouse
2013-09-05  3:48       ` Benjamin Marzinski
2013-09-05  8:28         ` Steven Whitehouse [this message]
2013-09-05  9:34 ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1378369689.2698.1.camel@menhir \
    --to=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.