linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <ric@emc.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>,
	sct@redhat.com, linux-fsdevel@vger.kernel.org
Subject: Re: [JBD] change batching logic to improve O_SYNC performance
Date: Thu, 15 Dec 2005 16:39:12 -0500	[thread overview]
Message-ID: <43A1E280.9080609@emc.com> (raw)
In-Reply-To: <20051215155552.1f71a16e.akpm@osdl.org>

Andrew Morton wrote:
> Benjamin LaHaise <bcrl@kvack.org> wrote:
> 
>>Hello folks,
>>
>>When writing files out using O_SYNC, jbd's 1 jiffy delay results in a 
>>significant drop in throughput as the disk sits idle.  The patch below 
>>results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on 
>>my IDE test box) when writing out files using O_SYNC.
> 
> 
> That's really sad.   Thanks for working that out.
> 
> 
>> Instead of always 
>>delaying for 1 jiffy when trying to batch, merely do a yield() to allow 
>>other processes to execute and potentially batch requests.
> 
> 
> Yeah, 2.4 has yield().  The O(1) yield semantics resulted in a performance
> catastrophe in ext3 when the system was busy, so the batching code got
> changed to a one-jiffy-sleep.  I don't think we can go back to yield().
> 
> Worst-case we should just dump the batching code: single-threaded
> O_SYNC/fsync is probably a commoner case than multi-threaded, dunno.

I think that the above assumption might be true for a single threaded 
O_SYNC process, but is not normally true for fsync() heavy workloads.

We have a multi-threaded write workload since we can boost files/sec by 
about 4-5x the single threaded write rate.  Using the a properly 
configured write barrier (highly recommended if you care about your data 
;-)) makes the cost of a fsync() call quite high so batching is a huge win.

I think that NFS servers and other multi-threaded apps (mail servers?) 
might have a similar profile . In these cases, you definitely benefit by 
combining multiple fsync() requests in one disk operation.

> 
> But surely we can do better than that.
> 
> How's about something simple like just saying "if the last process which
> did a synchronous write is not this process, do the batching thing".
> 
> 

Despite some obvious complexity, I still think that adjusting the delay 
based on rate of the synchronous requests would be the best case.  For 
example, even in the O_SYNC write case, if you have a single thread 
writing to disk in rapid succession, any delay is probably a waste.

Another way to attack this is to actually expose some of the 
transacation mechanisms to the applications so they can do some explicit 
control over the commit phase which could be used to build batched 
fsync(), etc.



  reply	other threads:[~2005-12-16  0:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-15 14:59 [JBD] change batching logic to improve O_SYNC performance Benjamin LaHaise
2005-12-15 14:22 ` Ric Wheeler
2005-12-15 23:55 ` Andrew Morton
2005-12-15 21:39   ` Ric Wheeler [this message]
2005-12-16  0:48   ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43A1E280.9080609@emc.com \
    --to=ric@emc.com \
    --cc=akpm@osdl.org \
    --cc=bcrl@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).