Re: [RFC v2] ext4: Don't send extra barrier during fsync if there are no dirty pages.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Ted Ts'o" <tytso@mit.edu>
To: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>, Ric Wheeler <rwheeler@redhat.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Keith Mannthey <kmannth@us.ibm.com>,
	Mingming Cao <mcao@us.ibm.com>
Subject: Re: [RFC v2] ext4: Don't send extra barrier during fsync if there are no dirty pages.
Date: Thu, 5 Aug 2010 12:40:08 -0400	[thread overview]
Message-ID: <20100805164008.GH2901@thunk.org> (raw)
In-Reply-To: <20100629205102.GM15515@tux1.beaverton.ibm.com>

On Tue, Jun 29, 2010 at 01:51:02PM -0700, Darrick J. Wong wrote:
> 
> This second version of the patch uses the inode state flags and
> (suboptimally) also catches directio writes.  It might be a better
> idea to try to coordinate all the barrier requests across the whole
> filesystem, though that's a bit more difficult.

Hi Darrick,

When I looked at this patch more closely, and thought about it hard,
the fact that this helps the FFSB mail server benchmark surprised me,
and then I realized it's because it doesn't really accurately emulate
a mail server at all.  Or at least, not a MTA.  In a MTA, only one CPU
will touch a queue file, so there should never be a case of a double
fsync to a single file.  This is why I was thinking about a
coordinating barrier requests across the whole filesystem --- it helps
out in the case where you have all your CPU threads hammering
/var/spool/mqueue, or /var/spool/exim4/input, and where they are all
creating queue files, and calling fsync() in parallel.  This patch
won't help that case.

It will help the case of a MDA --- Mail Delivery Agent --- if you have
multiple e-mails all getting delivered at the same time into the same
/var/mail/<username> file, with an fsync() following after a mail
message is appended to the file.  This is a much rarer case, and I
can't think of any other workload where you will have multiple
processes racing against each other and fsync'ing the same inode.
Even in the MDA case, it's rare that you will have one mbox getting so
many deliveries that this case would be hit.

So while I was thinking about accepting this patch, I now find myself
hesitating.  There _is_ a minor race in the patch that I noticed,
which I'll point out below, but that's easily fixed.  The bigger issue
is it's not clear this patch will actually make a difference in the
real world.  I trying and failing to think of a real-life application
which is stupid enough to do back-to-back fsync commands, even if it's
because it has multiple threads all trying to write to the file and
fsync to it in an uncoordinated fashion.  It would be easily enough to
add instrumentation that would trigger a printk if the patch optimized
out a barrier --- and if someone can point out even one badly written
application --- whether it's mysql, postgresql, a GNOME or KDE
application, db2, Oracle, etc., I'd say sure.  But adding even a tiny
amount of extra complexity for something which is _only_ helpful for a
benchmark grates against my soul....

So if you can think of something, please point it out to me.  If it
would help ext4 users in real life, I'd be all for it.  But at this
point, I'm thinking that perhaps the real issue is that the mail
server benchmark isn't accurately reflecting a real life workload.

Am I missing something?

						- Ted

> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 592adf2..96625c3 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -130,8 +130,11 @@ int ext4_sync_file(struct file *file, int datasync)
>  			blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL,
>  					NULL, BLKDEV_IFL_WAIT);
>  		ret = jbd2_log_wait_commit(journal, commit_tid);
> -	} else if (journal->j_flags & JBD2_BARRIER)
> +	} else if (journal->j_flags & JBD2_BARRIER &&
> +		   ext4_test_inode_state(inode, EXT4_STATE_DIRTY_DATA)) {
>  		blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL,
>  			BLKDEV_IFL_WAIT);
> +		ext4_clear_inode_state(inode, EXT4_STATE_DIRTY_DATA);
> +	}
>  	return ret;

This is the minor race I was talking about; you should move the
ext4_clear_inode_state() call above blkdev_issue_flush().  If there is
a race, you want to fail safe, by accidentally issuing a second
barrier, instead of possibly skipping a barrier if a page gets dirtied
*after* the blkdev_issue_flush() has taken effect, but *before* we
have a chance to clear the EXT4_STATE_DIRTY_DATA flag.

BTW, my apologies for not looking at this sooner, and giving you this
feedback earlier.  This summer has been crazy busy, and I didn't have
time until the merge window provided a forcing function to look at
outstanding patches.

next prev parent reply	other threads:[~2010-08-05 16:40 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-29 23:51 [RFC] ext4: Don't send extra barrier during fsync if there are no dirty pages Darrick J. Wong
2010-05-04  0:57 ` Mingming Cao
2010-05-04 14:16   ` Ric Wheeler
2010-05-04 15:45     ` Christoph Hellwig
2010-06-30 12:48       ` tytso
2010-06-30 13:21         ` Ric Wheeler
2010-06-30 13:44           ` tytso
2010-06-30 13:54             ` Ric Wheeler
2010-06-30 19:05               ` Andreas Dilger
2010-07-21 17:16             ` Jan Kara
2010-08-03  0:09               ` Darrick J. Wong
2010-08-03  9:01                 ` Christoph Hellwig
2010-08-04 18:16                   ` Darrick J. Wong
2010-08-03 13:21                 ` Jan Kara
2010-08-03 13:24         ` Avi Kivity
2010-08-04 23:32           ` Ted Ts'o
2010-08-05  2:20             ` Avi Kivity
2010-08-05 16:17               ` Ted Ts'o
2010-08-05 19:13                 ` Jeff Moyer
2010-08-05 20:39                   ` Ted Ts'o
2010-08-05 20:44                     ` Jeff Moyer
2010-05-04 19:49     ` Mingming Cao
2010-06-29 20:51       ` [RFC v2] " Darrick J. Wong
2010-08-05 16:40         ` Ted Ts'o [this message]
2010-08-05 16:45           ` Ted Ts'o
2010-08-06  7:04             ` Darrick J. Wong
2010-08-06 10:17               ` Ric Wheeler
2010-08-09 19:53               ` [RFC v3] ext4: Combine barrier requests coming from fsync Darrick J. Wong
2010-08-09 21:07                 ` Christoph Hellwig
2010-08-16 16:14                   ` Darrick J. Wong
2010-08-19  2:07                     ` Darrick J. Wong
2010-08-19  8:53                       ` Christoph Hellwig
2010-08-19  9:17                         ` Tejun Heo
2010-08-19 15:48                           ` Tejun Heo
2010-08-09 21:19                 ` Andreas Dilger
2010-08-09 23:38                   ` Darrick J. Wong
2010-08-19  2:14                     ` [RFC v4] ext4: Coordinate fsync requests Darrick J. Wong
2010-08-23 18:31                       ` Performance testing of various barrier reduction patches [was: Re: [RFC v4] ext4: Coordinate fsync requests] Darrick J. Wong
2010-09-23 23:25                         ` Darrick J. Wong
2010-09-24  6:24                           ` Andreas Dilger
2010-09-24 11:44                             ` Ric Wheeler
2010-09-27 23:01                             ` Darrick J. Wong
2010-10-08 21:26                               ` Darrick J. Wong
2010-10-08 21:56                                 ` Ric Wheeler
2010-10-11 20:20                                   ` Darrick J. Wong
2010-10-12 14:14                                     ` Christoph Hellwig
2010-10-15 23:39                                       ` Darrick J. Wong
2010-10-15 23:40                                         ` Christoph Hellwig
2010-10-16  0:02                                           ` Darrick J. Wong
2010-10-11 14:33                                 ` Ted Ts'o
2010-10-18 22:49                                 ` Darrick J. Wong
2010-10-19 18:28                                   ` Christoph Hellwig
2010-08-06  7:13           ` [RFC v2] ext4: Don't send extra barrier during fsync if there are no dirty pages Darrick J. Wong
2010-08-06 18:04             ` Ted Ts'o
2010-08-09 19:36               ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100805164008.GH2901@thunk.org \
    --to=tytso@mit.edu \
    --cc=cmm@us.ibm.com \
    --cc=djwong@us.ibm.com \
    --cc=kmannth@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcao@us.ibm.com \
    --cc=rwheeler@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox