[PATCH, RFC 0/2] Mitigate fsync's with ext3's data=ordered mode

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Theodore Ts'o <tytso@mit.edu>
To: linux-kernel@vger.kernel.org
Cc: akpm@linux-foundation.org,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: [PATCH, RFC 0/2] Mitigate fsync's with ext3's data=ordered mode
Date: Sat, 21 Mar 2009 21:12:08 -0400	[thread overview]
Message-ID: <1237684330-11770-1-git-send-email-tytso@mit.edu> (raw)

Given the recent hoo-hah about ext4 and delayed allocation, I reviewed
the history of the Firefox 3.0 bug here:

    https://bugzilla.mozilla.org/show_bug.cgi?id=421482

Reports of the fsync() getting delayed by up to 30 seconds didn't make
sense to me, since there shouldn't be that much data waiting to be
flushed out, even if there was a very heavy write-intensive job
writing multiple gigabytes to the file.  When I looked more closely,
it became clear that what was really going on was a *read* intensive
job that was starving writes, due to the fact that the writes
submitted from the journal are using WRITE instead of WRITE_SYNC, and
I/O schedulers tend to prioritize reads ahead of writes.

This also explains why Aryan's patch which forced a higher I/O
priority for kjournald was helpful.  This is a better approach, since
we only force journal blocks out using WRITE_SYNC if the transaction
was triggered by something synchronous, such as an fsync() call, or a
file descriptor opened with O_SYNC.  The first patch does cause data
blocks forced out using data=ordered to be written out using
WRITE_SYNC even if the commit kicked off due to the 5 second commit
interval --- however, it does make the right thing happen when the
blocks are being forced out due to fsync() or fdatasync(), when before
the writes were being submitted without being marked as synchronous
writes.  

If it is considered highly objectionable that asynchronous commits
will result in WRITE_SYNC writes, we could add a new flag to the wbc
structure which could be passed all the way down to
block_write_full_page().  On the other hand, in the long run it's
better that commit complete sooner rather than later, since a
subsequent transaction could end up blocked behind the current
transaction, and that subsequent transaction could be a synchronous
one blocking an fsync() or some other synchronous operation.

I've done experiments with and without these patches, and it
definitely helps fsync() latency from between when there is a heavy
read intensive job starving the writes by about 75%.  The workload I
used was a tar command; I suspect if I had used a dd of a really huge
file, the fsync times without the patch would be even worse, and the
concommittent improvements would be even better.

						- Ted

next             reply	other threads:[~2009-03-22  1:12 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-22  1:12 Theodore Ts'o [this message]
2009-03-22  1:12 ` [PATCH, RFC 1/2] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
2009-03-22  1:12   ` [PATCH, RFC 2/2] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1237684330-11770-1-git-send-email-tytso@mit.edu \
    --to=tytso@mit.edu \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).