* [PATCH] jbd2: Use WRITE_SYNC in journal checkpoint.
@ 2011-06-07 3:49 Tao Ma
2011-06-14 15:22 ` Tao Ma
2011-06-27 16:41 ` Ted Ts'o
0 siblings, 2 replies; 3+ messages in thread
From: Tao Ma @ 2011-06-07 3:49 UTC (permalink / raw)
To: linux-ext4; +Cc: Jan Kara, Theodore Ts'o
From: Tao Ma <boyu.mt@taobao.com>
In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:
INFO: task attr_set:3816 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
attr_set D ffff880028393480 0 3816 1 0x00000000
ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
Call Trace:
[<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
[<ffffffff8103caad>] ? need_resched+0x23/0x2d
[<ffffffff814006a6>] ? thread_return+0xa2/0xbc
[<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
[<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
[<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
[<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
[<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
[<ffffffff81400b2d>] mutex_lock+0x1b/0x32
[<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
[<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
[<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
[<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
[<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
[<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
[<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
[<ffffffff81145adb>] generic_setxattr+0x6b/0x76
[<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
[<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
[<ffffffff81146c88>] setxattr+0xb5/0xe8
[<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
[<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
[<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
fs/jbd2/checkpoint.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 6a79fd0..b372ea2 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -254,9 +254,12 @@ static void
__flush_batch(journal_t *journal, int *batch_count)
{
int i;
+ struct blk_plug plug;
+ blk_start_plug(&plug);
for (i = 0; i < *batch_count; i++)
- write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE);
+ write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE_SYNC);
+ blk_finish_plug(&plug);
for (i = 0; i < *batch_count; i++) {
struct buffer_head *bh = journal->j_chkpt_bhs[i];
--
1.7.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] jbd2: Use WRITE_SYNC in journal checkpoint.
2011-06-07 3:49 [PATCH] jbd2: Use WRITE_SYNC in journal checkpoint Tao Ma
@ 2011-06-14 15:22 ` Tao Ma
2011-06-27 16:41 ` Ted Ts'o
1 sibling, 0 replies; 3+ messages in thread
From: Tao Ma @ 2011-06-14 15:22 UTC (permalink / raw)
To: linux-ext4; +Cc: Jan Kara, Theodore Ts'o
Hi Ted,
Any comments for this patch? Jan is OK with it and wait for the jbd2
part to be committed first.
Thanks.
Tao
On 06/07/2011 11:49 AM, Tao Ma wrote:
> From: Tao Ma <boyu.mt@taobao.com>
>
> In journal checkpoint, we write the buffer and wait for its finish.
> But in cfq, the async queue has a very low priority, and in our test,
> if there are too many sync queues and every queue is filled up with
> requests, the write request will be delayed for quite a long time and
> all the tasks which are waiting for journal space will end with errors like:
>
> INFO: task attr_set:3816 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> attr_set D ffff880028393480 0 3816 1 0x00000000
> ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
> ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
> ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
> Call Trace:
> [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
> [<ffffffff8103caad>] ? need_resched+0x23/0x2d
> [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
> [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
> [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
> [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
> [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
> [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
> [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
> [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
> [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
> [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
> [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
> [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
> [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
> [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
> [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
> [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
> [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
> [<ffffffff81146c88>] setxattr+0xb5/0xe8
> [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
> [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
> [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
>
> So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
> be moved into sync queue and handled by cfq timely. We also use the new plug,
> sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
>
> Cc: Jan Kara <jack@suse.cz>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Reported-by: Robin Dong <sanbai@taobao.com>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
> fs/jbd2/checkpoint.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 6a79fd0..b372ea2 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -254,9 +254,12 @@ static void
> __flush_batch(journal_t *journal, int *batch_count)
> {
> int i;
> + struct blk_plug plug;
>
> + blk_start_plug(&plug);
> for (i = 0; i < *batch_count; i++)
> - write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE);
> + write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE_SYNC);
> + blk_finish_plug(&plug);
>
> for (i = 0; i < *batch_count; i++) {
> struct buffer_head *bh = journal->j_chkpt_bhs[i];
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] jbd2: Use WRITE_SYNC in journal checkpoint.
2011-06-07 3:49 [PATCH] jbd2: Use WRITE_SYNC in journal checkpoint Tao Ma
2011-06-14 15:22 ` Tao Ma
@ 2011-06-27 16:41 ` Ted Ts'o
1 sibling, 0 replies; 3+ messages in thread
From: Ted Ts'o @ 2011-06-27 16:41 UTC (permalink / raw)
To: Tao Ma; +Cc: linux-ext4, Jan Kara
On Tue, Jun 07, 2011 at 11:49:19AM +0800, Tao Ma wrote:
> From: Tao Ma <boyu.mt@taobao.com>
>
> In journal checkpoint, we write the buffer and wait for its finish.
> But in cfq, the async queue has a very low priority, and in our test,
> if there are too many sync queues and every queue is filled up with
> requests, the write request will be delayed for quite a long time and
> all the tasks which are waiting for journal space will end with errors like:
Thanks, added to the ext4 patch tree.
- Ted
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-06-27 16:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-07 3:49 [PATCH] jbd2: Use WRITE_SYNC in journal checkpoint Tao Ma
2011-06-14 15:22 ` Tao Ma
2011-06-27 16:41 ` Ted Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).