From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tao Ma Subject: Re: [PATCH] jbd2: Use WRITE_SYNC in journal checkpoint. Date: Tue, 14 Jun 2011 23:22:36 +0800 Message-ID: <4DF77CBC.8050102@tao.ma> References: <1307418559-3329-1-git-send-email-tm@tao.ma> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Jan Kara , Theodore Ts'o To: linux-ext4@vger.kernel.org Return-path: Received: from oproxy4-pub.bluehost.com ([69.89.21.11]:49114 "HELO oproxy4-pub.bluehost.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752975Ab1FNPWs (ORCPT ); Tue, 14 Jun 2011 11:22:48 -0400 In-Reply-To: <1307418559-3329-1-git-send-email-tm@tao.ma> Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi Ted, Any comments for this patch? Jan is OK with it and wait for the jbd2 part to be committed first. Thanks. Tao On 06/07/2011 11:49 AM, Tao Ma wrote: > From: Tao Ma > > In journal checkpoint, we write the buffer and wait for its finish. > But in cfq, the async queue has a very low priority, and in our test, > if there are too many sync queues and every queue is filled up with > requests, the write request will be delayed for quite a long time and > all the tasks which are waiting for journal space will end with errors like: > > INFO: task attr_set:3816 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > attr_set D ffff880028393480 0 3816 1 0x00000000 > ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8 > ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080 > ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002 > Call Trace: > [] ? __dequeue_entity+0x33/0x38 > [] ? need_resched+0x23/0x2d > [] ? thread_return+0xa2/0xbc > [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2] > [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2] > [] __mutex_lock_common+0x14e/0x1a9 > [] ? brelse+0x13/0x15 [ext4] > [] __mutex_lock_slowpath+0x19/0x1b > [] mutex_lock+0x1b/0x32 > [] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2] > [] start_this_handle+0x438/0x527 [jbd2] > [] ? autoremove_wake_function+0x0/0x3e > [] jbd2_journal_start+0xa1/0xcc [jbd2] > [] ext4_journal_start_sb+0x57/0x81 [ext4] > [] ext4_xattr_set+0x6c/0xe3 [ext4] > [] ext4_xattr_user_set+0x42/0x4b [ext4] > [] generic_setxattr+0x6b/0x76 > [] __vfs_setxattr_noperm+0x47/0xc0 > [] vfs_setxattr+0x7f/0x9a > [] setxattr+0xb5/0xe8 > [] ? do_filp_open+0x571/0xa6e > [] sys_fsetxattr+0x6b/0x91 > [] system_call_fastpath+0x16/0x1b > > So this patch tries to use WRITE_SYNC in __flush_batch so that the request will > be moved into sync queue and handled by cfq timely. We also use the new plug, > sot that all the WRITE_SYNC requests can be given as a whole when we unplug it. > > Cc: Jan Kara > Cc: "Theodore Ts'o" > Reported-by: Robin Dong > Signed-off-by: Tao Ma > --- > fs/jbd2/checkpoint.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c > index 6a79fd0..b372ea2 100644 > --- a/fs/jbd2/checkpoint.c > +++ b/fs/jbd2/checkpoint.c > @@ -254,9 +254,12 @@ static void > __flush_batch(journal_t *journal, int *batch_count) > { > int i; > + struct blk_plug plug; > > + blk_start_plug(&plug); > for (i = 0; i < *batch_count; i++) > - write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE); > + write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE_SYNC); > + blk_finish_plug(&plug); > > for (i = 0; i < *batch_count; i++) { > struct buffer_head *bh = journal->j_chkpt_bhs[i];