* [PATCH, RFC 0/2] Mitigate fsync's with ext3's data=ordered mode
@ 2009-03-22 1:12 Theodore Ts'o
2009-03-22 1:12 ` [PATCH, RFC 1/2] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
0 siblings, 1 reply; 3+ messages in thread
From: Theodore Ts'o @ 2009-03-22 1:12 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm, Ext4 Developers List
Given the recent hoo-hah about ext4 and delayed allocation, I reviewed
the history of the Firefox 3.0 bug here:
https://bugzilla.mozilla.org/show_bug.cgi?id=421482
Reports of the fsync() getting delayed by up to 30 seconds didn't make
sense to me, since there shouldn't be that much data waiting to be
flushed out, even if there was a very heavy write-intensive job
writing multiple gigabytes to the file. When I looked more closely,
it became clear that what was really going on was a *read* intensive
job that was starving writes, due to the fact that the writes
submitted from the journal are using WRITE instead of WRITE_SYNC, and
I/O schedulers tend to prioritize reads ahead of writes.
This also explains why Aryan's patch which forced a higher I/O
priority for kjournald was helpful. This is a better approach, since
we only force journal blocks out using WRITE_SYNC if the transaction
was triggered by something synchronous, such as an fsync() call, or a
file descriptor opened with O_SYNC. The first patch does cause data
blocks forced out using data=ordered to be written out using
WRITE_SYNC even if the commit kicked off due to the 5 second commit
interval --- however, it does make the right thing happen when the
blocks are being forced out due to fsync() or fdatasync(), when before
the writes were being submitted without being marked as synchronous
writes.
If it is considered highly objectionable that asynchronous commits
will result in WRITE_SYNC writes, we could add a new flag to the wbc
structure which could be passed all the way down to
block_write_full_page(). On the other hand, in the long run it's
better that commit complete sooner rather than later, since a
subsequent transaction could end up blocked behind the current
transaction, and that subsequent transaction could be a synchronous
one blocking an fsync() or some other synchronous operation.
I've done experiments with and without these patches, and it
definitely helps fsync() latency from between when there is a heavy
read intensive job starving the writes by about 75%. The workload I
used was a tar command; I suspect if I had used a dd of a really huge
file, the fsync times without the patch would be even worse, and the
concommittent improvements would be even better.
- Ted
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH, RFC 1/2] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks
2009-03-22 1:12 [PATCH, RFC 0/2] Mitigate fsync's with ext3's data=ordered mode Theodore Ts'o
@ 2009-03-22 1:12 ` Theodore Ts'o
2009-03-22 1:12 ` [PATCH, RFC 2/2] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o
0 siblings, 1 reply; 3+ messages in thread
From: Theodore Ts'o @ 2009-03-22 1:12 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm, Ext4 Developers List, Theodore Ts'o
When doing synchronous writes because wbc->sync_mode is set to
WBC_SYNC_ALL, send the write request using WRITE_SYNC, so that we
don't unduly block system calls such as fsync().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
fs/buffer.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 3cc3a7d..4ea1bbb 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1755,6 +1755,7 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
struct buffer_head *bh, *head;
const unsigned blocksize = 1 << inode->i_blkbits;
int nr_underway = 0;
+ int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
BUG_ON(!PageLocked(page));
@@ -1846,7 +1847,7 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
do {
struct buffer_head *next = bh->b_this_page;
if (buffer_async_write(bh)) {
- submit_bh(WRITE, bh);
+ submit_bh(write_op, bh);
nr_underway++;
}
bh = next;
@@ -1900,7 +1901,7 @@ recover:
struct buffer_head *next = bh->b_this_page;
if (buffer_async_write(bh)) {
clear_buffer_dirty(bh);
- submit_bh(WRITE, bh);
+ submit_bh(write_op, bh);
nr_underway++;
}
bh = next;
--
1.5.6.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH, RFC 2/2] ext3: Use WRITE_SYNC for commits which are caused by fsync()
2009-03-22 1:12 ` [PATCH, RFC 1/2] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
@ 2009-03-22 1:12 ` Theodore Ts'o
0 siblings, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2009-03-22 1:12 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm, Ext4 Developers List, Theodore Ts'o
If a commit is triggered by fsync(), set a flag indicating the journal
blocks associated with the transaction should be flushed out using
WRITE_SYNC.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
fs/jbd/commit.c | 23 +++++++++++++++--------
fs/jbd/transaction.c | 2 ++
include/linux/jbd.h | 5 +++++
3 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 3fbffb1..f8077b9 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -20,6 +20,7 @@
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
+#include <linux/bio.h>
/*
* Default IO end handler for temporary BJ_IO buffer_heads.
@@ -171,14 +172,15 @@ static int journal_write_commit_record(journal_t *journal,
return (ret == -EIO);
}
-static void journal_do_submit_data(struct buffer_head **wbuf, int bufs)
+static void journal_do_submit_data(struct buffer_head **wbuf, int bufs,
+ int write_op)
{
int i;
for (i = 0; i < bufs; i++) {
wbuf[i]->b_end_io = end_buffer_write_sync;
/* We use-up our safety reference in submit_bh() */
- submit_bh(WRITE, wbuf[i]);
+ submit_bh(write_op, wbuf[i]);
}
}
@@ -186,7 +188,8 @@ static void journal_do_submit_data(struct buffer_head **wbuf, int bufs)
* Submit all the data buffers to disk
*/
static int journal_submit_data_buffers(journal_t *journal,
- transaction_t *commit_transaction)
+ transaction_t *commit_transaction,
+ int write_op)
{
struct journal_head *jh;
struct buffer_head *bh;
@@ -225,7 +228,7 @@ write_out_data:
BUFFER_TRACE(bh, "needs blocking lock");
spin_unlock(&journal->j_list_lock);
/* Write out all data to prevent deadlocks */
- journal_do_submit_data(wbuf, bufs);
+ journal_do_submit_data(wbuf, bufs, write_op);
bufs = 0;
lock_buffer(bh);
spin_lock(&journal->j_list_lock);
@@ -256,7 +259,7 @@ write_out_data:
jbd_unlock_bh_state(bh);
if (bufs == journal->j_wbufsize) {
spin_unlock(&journal->j_list_lock);
- journal_do_submit_data(wbuf, bufs);
+ journal_do_submit_data(wbuf, bufs, write_op);
bufs = 0;
goto write_out_data;
}
@@ -286,7 +289,7 @@ write_out_data:
}
}
spin_unlock(&journal->j_list_lock);
- journal_do_submit_data(wbuf, bufs);
+ journal_do_submit_data(wbuf, bufs, write_op);
return err;
}
@@ -315,6 +318,7 @@ void journal_commit_transaction(journal_t *journal)
int first_tag = 0;
int tag_flag;
int i;
+ int write_op = WRITE;
/*
* First job: lock down the current transaction and wait for
@@ -347,6 +351,8 @@ void journal_commit_transaction(journal_t *journal)
spin_lock(&journal->j_state_lock);
commit_transaction->t_state = T_LOCKED;
+ if (commit_transaction->t_synchronous_commit)
+ write_op = WRITE_SYNC;
spin_lock(&commit_transaction->t_handle_lock);
while (commit_transaction->t_updates) {
DEFINE_WAIT(wait);
@@ -431,7 +437,8 @@ void journal_commit_transaction(journal_t *journal)
* Now start flushing things to disk, in the order they appear
* on the transaction lists. Data blocks go first.
*/
- err = journal_submit_data_buffers(journal, commit_transaction);
+ err = journal_submit_data_buffers(journal, commit_transaction,
+ write_op);
/*
* Wait for all previously submitted IO to complete.
@@ -660,7 +667,7 @@ start_journal_io:
clear_buffer_dirty(bh);
set_buffer_uptodate(bh);
bh->b_end_io = journal_end_buffer_io_sync;
- submit_bh(WRITE, bh);
+ submit_bh(write_op, bh);
}
cond_resched();
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index e6a1174..ed886e6 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1440,6 +1440,8 @@ int journal_stop(handle_t *handle)
}
}
+ if (handle->h_sync)
+ transaction->t_synchronous_commit = 1;
current->journal_info = NULL;
spin_lock(&journal->j_state_lock);
spin_lock(&transaction->t_handle_lock);
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index 64246dc..2c69431 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -552,6 +552,11 @@ struct transaction_s
*/
int t_handle_count;
+ /*
+ * This transaction is being forced and some process is
+ * waiting for it to finish.
+ */
+ int t_synchronous_commit:1;
};
/**
--
1.5.6.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-03-22 1:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-22 1:12 [PATCH, RFC 0/2] Mitigate fsync's with ext3's data=ordered mode Theodore Ts'o
2009-03-22 1:12 ` [PATCH, RFC 1/2] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
2009-03-22 1:12 ` [PATCH, RFC 2/2] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).