linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Andreas Dilger <adilger@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Josef Bacik <jbacik@redhat.com>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	linux-fsdevel@vger.kernel.org, chris.mason@oracle.com,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/2] improve ext3 fsync batching
Date: Tue, 19 Aug 2008 07:01:11 -0400	[thread overview]
Message-ID: <48AAA7F7.5090501@redhat.com> (raw)
In-Reply-To: <20080819054414.GM3392@webber.adilger.int>

Andreas Dilger wrote:
> On Aug 18, 2008  21:31 -0700, Andrew Morton wrote:
>   
>> On Wed, 6 Aug 2008 15:15:36 -0400 Josef Bacik <jbacik@redhat.com> wrote:
>>     
>>> Using the following fs_mark command to test the speeds
>>>
>>> ./fs_mark -d /mnt/ext3-test -s 4096 -n 2000 -D 64 -t 2
>>>
>>> I got the following results (with write cacheing turned off)
>>>
>>> type	threads		with patch	without patch
>>> sata	2		26.4		27.8
>>> sata	4		44.6		44.4
>>> sata	8		70.4		72.8
>>> sata	16		75.2		89.6
>>> sata	32		92.7		96.0
>>> ram	1		2399.1		2398.8
>>> ram	2		257.3		3603.0
>>> ram	4		395.6		4827.9
>>> ram	8		659.0		4721.1
>>> ram	16		1326.4		4373.3
>>> ram	32		1964.2		3816.3
>>>
>>> I used a ramdisk to emulate a "fast" disk since I don't happen to have a
>>> clariion sitting around.  I didn't test single thread in the sata case as it
>>> should be relatively the same between the two.  Thanks,
>>>       
>> This is all a bit mysterious.  That delay doesn't have much at all to
>> do with commit times.  The code is looping around giving other
>> userspace processes an opportunity to get scheduled and to run an fsync
>> and to join the current transaction rather than having to start a new
>> one.
>>
>> (that code was quite effective when I first added it, but in more
>> recent testing, which was some time ago, it doesn't appear to provide
>> any improvement.  This needs to be understood)
>>     
>
> I don't think it is mysterious at all.  With a HZ=100 system 1 jiffie
> is 10ms, which was comparable to the seek time of a disk, so sleeping
> for 1 jiffie to avoid doing 2 transactions was a win.  With a flash
> device (simulated by RAM here) seek time is 1ms so waiting 10ms
> isn't going to be useful if there are only 2 threads and both have
> already joined the transaction.
>   

The code was originally tuned to S-ATA & ATA disk response times which 
are closer to 12-15ms. Sleeping for 10ms (100HZ kernel) or 4ms (250HZ) 
did not overly penalize the low thread count case and worked well for 
higher thread counts (and ext3 special cases the single threaded writer 
so no sleep happens).

This is still a really, really good thing to do, but we need to sleep 
less when the device characteristics are radically different. For 
example, a fibre channel attached disk array drops that 12-15 ms down to 
1.5 ms (not to mention RAM disks!).
>   
>> Also, I'd expect that the average commit time is much longer that one
>> jiffy on most disks, and perhaps even on fast disks and maybe even on
>> ramdisk.  So perhaps what's happened here is that you've increased the
>> sleep period and more tasks are joining particular transactions.
>>
>> Or you've shortened the sleep time (which wasn't really doing anything
>> useful) and this causes tasks to spend less time asleep.
>>     
>
> I think both are true.  By making the sleep time dynamic it removes
> the "useless" sleep time, but can also increase the sleep time if
> there are many threads and the commit cost is better amortized over
> more operations.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>   

It would be great to be able to use this batching technique for faster 
devices, but we currently sleep 3-4 times longer waiting to batch for an 
array than it takes to complete the transaction.

Thanks!

Ric



  reply	other threads:[~2008-08-19 11:01 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-06 19:08 [PATCH 1/2] add hrtimer_sleep_ns helper function Josef Bacik
2008-08-06 19:15 ` [PATCH 2/2] improve ext3 fsync batching Josef Bacik
2008-08-06 19:23   ` Josef Bacik
2008-08-19  4:31   ` Andrew Morton
2008-08-19  5:44     ` Andreas Dilger
2008-08-19 11:01       ` Ric Wheeler [this message]
2008-08-19 17:56         ` Andrew Morton
2008-08-19 18:08           ` Ric Wheeler
2008-08-19 20:29             ` Andrew Morton
2008-08-19 20:55               ` Ric Wheeler
2008-08-19 21:18                 ` Andrew Morton
2008-08-19 21:29                   ` Ric Wheeler
2008-08-19 18:43           ` Ric Wheeler
2008-08-19 20:34             ` Andrew Morton
2008-08-19 19:18           ` Josef Bacik
2008-08-19 19:15 ` [PATCH 1/2] add hrtimer_sleep_ns helper function Matthew Wilcox
2008-08-19 19:22   ` Josef Bacik
2008-08-19 19:36     ` Matthew Wilcox
2008-08-19 19:39       ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48AAA7F7.5090501@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=adilger@sun.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=jbacik@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).