From: Tao Ma <tm@tao.ma>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-ext4@vger.kernel.org, stable@kernel.org,
Jan Kara <jack@suse.cz>, Jeff Moyer <jmoyer@redhat.com>,
Corrado Zoccolo <czoccolo@gmail.com>,
Jens Axboe <jaxboe@fusionio.com>
Subject: Re: [PATCH] jbd/2[stable only]: Use WRITE_SYNC_PLUG in journal_commit_transaction.
Date: Tue, 12 Jul 2011 23:19:06 +0800 [thread overview]
Message-ID: <4E1C65EA.5060009@tao.ma> (raw)
In-Reply-To: <20110712123041.GC1293@redhat.com>
On 07/12/2011 08:30 PM, Vivek Goyal wrote:
> On Tue, Jul 12, 2011 at 06:43:51PM +0800, Tao Ma wrote:
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In commit 749ef9f8423, we use WRITE_SYNC instead of WRITE in
>> journal_commit_transaction. It causes a much heavy burden for
>> the disk as now the seqenctial write can't be merged(see the blktrace below).
>
> Tao Ma,
>
> Few queries.
>
> - What's the workload you are using for this test.
A very simple one.
mkfs.ext4 -b 2048 /dev/sdx 10000000
sync
mount -t ext4 -o delalloc /dev/sdx /mnt/ext4
dd if=/dev/zero of=/mnt/ext4/a bs=1024K count=1
and run blktrace immediately after the 'dd'. When jbd2 begins to work,
you will get the blktrace output you want.
>
> - Do you see any performance improvement by switching to WRITE_SYNC_PLUG.
I haven't done much tests yet. But I guess if there are many heavy sync
workload, we should suffer from some latency if we dispatch these
sequential write one by one. As I have said, Jens added plug/unplug in
39, and now these sequential write are dispatched in a one request. Run
the same test cases with 3.0-rcX, you will get the same result.
>
> - Why writes are not being merged? Because request got dispatched
> immediately? Do you have logs for insertion of requests also.
You can get it from the above test case.
>
> - WRITE_SYNC_PLUG will plug the queue and expects explicity unplug. Who
> is doing unplug in this case?
See the comments I removed, "we rely on sync_buffer() doing the unplug
for us". I removed them cause we all use pluged write now.
>
> - I am not sure in how many cases we are expecting to submit multiple
> sequential write here.
All the journal write will cause a sequential write to be split to many
requests here. So it would mean too much for metadata heavy test I think.
Thanks
Tao
>
> Thanks
> Vivek
>
>>
>> Given the description of that commit 749ef9f8423, the reason why
>> we use WRITE_SYNC is that it wants to use REQ_NOIDLE and WRITE_SYNC_PLUG
>> also has that flag, so use WRITE_SYNC_PLUG instead. From blktrace,
>> we can get that:
>>
>> without the patch:
>> 8,0 6 18 0.016058423 3342 D W 461101317 + 4 [jbd2/sda11-8]
>> 8,0 6 19 0.016065473 3342 D W 461101321 + 4 [jbd2/sda11-8]
>> 8,0 6 20 0.016070751 3342 D W 461101325 + 4 [jbd2/sda11-8]
>> 8,0 6 21 0.016076180 3342 D W 461101329 + 4 [jbd2/sda11-8]
>> 8,0 6 22 0.016081255 3342 D W 461101333 + 4 [jbd2/sda11-8]
>> 8,0 6 23 0.016085963 3342 D W 461101337 + 4 [jbd2/sda11-8]
>> 8,0 6 24 0.016182048 0 C W 461101317 + 4 [0]
>> 8,0 6 25 0.016190820 0 C W 461101325 + 4 [0]
>> 8,0 6 26 0.016193927 0 C W 461101321 + 4 [0]
>> 8,0 6 27 0.016196532 0 C W 461101333 + 4 [0]
>> 8,0 6 28 0.016199180 0 C W 461101337 + 4 [0]
>> 8,0 6 29 0.016206180 0 C W 461101329 + 4 [0]
>>
>> with this patch:
>> 8,0 4 23 4.320315739 3129 D W 461101317 + 24 [jbd2/sda11-8]
>> 8,0 4 24 4.320364518 0 C W 461101317 + 24 [0]
>>
>> This only affects .37 and .38 since Jens' new plug patches are included
>> in .39 and the bug is removed as a side effect. But I think it is needed
>> anyway for the stable. And RHEL6 needs this also I guess.
>>
>> Cc: stable@kernel.org # 2.6.37 and 2.6.38
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Vivek Goyal <vgoyal@redhat.com>
>> Cc: Jeff Moyer <jmoyer@redhat.com>
>> Cc: Corrado Zoccolo <czoccolo@gmail.com>
>> Cc: Jens Axboe <jaxboe@fusionio.com>
>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>> ---
>> fs/jbd/commit.c | 9 +--------
>> fs/jbd2/commit.c | 9 +--------
>> 2 files changed, 2 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
>> index 34a4861..6d13df5 100644
>> --- a/fs/jbd/commit.c
>> +++ b/fs/jbd/commit.c
>> @@ -294,7 +294,7 @@ void journal_commit_transaction(journal_t *journal)
>> int first_tag = 0;
>> int tag_flag;
>> int i;
>> - int write_op = WRITE_SYNC;
>> + int write_op = WRITE_SYNC_PLUG;
>>
>> /*
>> * First job: lock down the current transaction and wait for
>> @@ -327,13 +327,6 @@ void journal_commit_transaction(journal_t *journal)
>> spin_lock(&journal->j_state_lock);
>> commit_transaction->t_state = T_LOCKED;
>>
>> - /*
>> - * Use plugged writes here, since we want to submit several before
>> - * we unplug the device. We don't do explicit unplugging in here,
>> - * instead we rely on sync_buffer() doing the unplug for us.
>> - */
>> - if (commit_transaction->t_synchronous_commit)
>> - write_op = WRITE_SYNC_PLUG;
>> spin_lock(&commit_transaction->t_handle_lock);
>> while (commit_transaction->t_updates) {
>> DEFINE_WAIT(wait);
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index f3ad159..fc3840f 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -329,7 +329,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>> int tag_bytes = journal_tag_bytes(journal);
>> struct buffer_head *cbh = NULL; /* For transactional checksums */
>> __u32 crc32_sum = ~0;
>> - int write_op = WRITE_SYNC;
>> + int write_op = WRITE_SYNC_PLUG;
>>
>> /*
>> * First job: lock down the current transaction and wait for
>> @@ -363,13 +363,6 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>> write_lock(&journal->j_state_lock);
>> commit_transaction->t_state = T_LOCKED;
>>
>> - /*
>> - * Use plugged writes here, since we want to submit several before
>> - * we unplug the device. We don't do explicit unplugging in here,
>> - * instead we rely on sync_buffer() doing the unplug for us.
>> - */
>> - if (commit_transaction->t_synchronous_commit)
>> - write_op = WRITE_SYNC_PLUG;
>> trace_jbd2_commit_locking(journal, commit_transaction);
>> stats.run.rs_wait = commit_transaction->t_max_wait;
>> stats.run.rs_locked = jiffies;
>> --
>> 1.7.4
next prev parent reply other threads:[~2011-07-12 15:19 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-12 10:43 [PATCH] jbd/2[stable only]: Use WRITE_SYNC_PLUG in journal_commit_transaction Tao Ma
2011-07-12 12:30 ` Vivek Goyal
2011-07-12 15:19 ` Tao Ma [this message]
2011-07-14 16:30 ` Jeff Moyer
2011-07-14 19:46 ` Jan Kara
2011-07-14 20:01 ` Vivek Goyal
2011-07-14 20:08 ` Jeff Moyer
2011-07-14 21:38 ` Jan Kara
2011-07-15 2:43 ` Tao Ma
2011-07-12 15:55 ` [stable] " Greg KH
2011-07-13 2:10 ` Tao Ma
2011-07-13 2:17 ` Greg KH
2011-07-13 2:21 ` Tao Ma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E1C65EA.5060009@tao.ma \
--to=tm@tao.ma \
--cc=czoccolo@gmail.com \
--cc=jack@suse.cz \
--cc=jaxboe@fusionio.com \
--cc=jmoyer@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=stable@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.