From: Markus Trippelsdorf <markus@trippelsdorf.de>
To: Tejun Heo <tj@kernel.org>
Cc: Michael Tokarev <mjt@tls.msk.ru>,
Robert Hancock <hancockrwd@gmail.com>,
Jeff Garzik <jgarzik@pobox.com>,
linux-ide@vger.kernel.org
Subject: Re: libata default FUA support
Date: Wed, 2 Mar 2011 18:29:40 +0100 [thread overview]
Message-ID: <20110302172940.GA1644@gentoo.trippels.de> (raw)
In-Reply-To: <20110302085823.GI19669@htj.dyndns.org>
[-- Attachment #1: Type: text/plain, Size: 3946 bytes --]
On 2011.03.02 at 09:58 +0100, Tejun Heo wrote:
> On Wed, Mar 02, 2011 at 10:30:57AM +0300, Michael Tokarev wrote:
> > > I believe the way the block layer uses it, basically it only saves the
> > > overhead of one transaction to the drive. It might be significant on
> > > some workloads (especially on high IOPS drives like SSDs) but it's
> > > likely not a huge deal.
> >
> > One transaction per what? If it means extra, especially "large"
> > transaction (lile flush with a wait) per each fsync-like call,
> > that can be huge deal actually, especially on database-like
> > workloads (lots of small syncronous random writes).
>
> The way flushes are used by filesystems is that FUA is usually only
> used right after another FLUSH. ie. Using FUA replaces FLUSH + commit
> block write + FLUSH sequence to FLUSH + FUA commit block write. Due
> to the preceding FLUSH, the cache is already empty, so the only
> difference between WRITE + FLUSH and FUA WRITE becomes the extra
> command issue overhead which is usually almost unnoticeable compared
> to the actual IO.
>
> Another thing is that with the recent updates to block FLUSH handling,
> using FUA might even be less efficient. The new implementation
> aggressively merges those commit writes and flushes. IOW, depending
> on timing, multiple consecutive commit writes can be merged as,
>
> FLUSH + commit writes + FLUSH
>
> or
>
> FLUSH + some commit writes + FLUSH + other commit writes + FLUSH
>
> and so on,
>
> These merges will happen with fsync heavy workloads where FLUSH
> performance actually matters and, in these scenarios, FUA writes is
> less effective because it puts extra ordering restrictions on each FUA
> write. ie. With surrounding FLUSHes, the drive is free to reorder
> commit writes to maximize performance, with FUA, the disk has to jump
> around all over the place to execute each command in the exact issue
> order.
>
> I personally think FUA is a misfeature. It's a microoptimization with
> shallow benefits even when used properly while putting much heavier
> restriction on actual IO order, which usually is the slow part.
Thanks for the detailed information. Just to confirm your point here are
some benchmark results:
(Seagate ST1500DL003 1.5TB 5900rpm, xfs (delaylog), ffsb (
http://sourceforge.net/projects/ffsb/ ) pure random write benchmark:
1)
Total Results 30sec run, 1 thread, 104*35MB files
Op Name Transactions Trans/sec % Trans % Op Weight Throughput
======= ============ ========= ======= =========== ==========
FUA: write : 435456 1183.44 100.000% 100.000% 162MB/sec
no FUA: write : 441600 1243.47 100.000% 100.000% 170MB/sec
System Call Latency statistics in millisecs
Min Avg Max Total Calls
======== ======== ======== ============
[ write]FUA 0.000000 0.070392 5444.638184 435456
[ write]no FUA 0.000000 0.069718 4715.519043 441600
2)
Total Results 240sec run, 2 threads, 104*35MB files
===============
Op Name Transactions Trans/sec % Trans % Op Weight Throughput
======= ============ ========= ======= =========== ==========
FUA: write : 594944 919.45 100.000% 100.000% 126MB/sec
no FUA: write : 653824 1097.31 100.000% 100.000% 150MB/sec
System Call Latency statistics in millisecs
Min Avg Max Total Calls
======== ======== ======== ============
[ write]FUA 0.000000 0.812704 13467.903320 594944
[ write]no FUA 0.000000 0.727761 9695.806641 653824
--
Markus
[-- Attachment #2: random_writes.ffsb --]
[-- Type: text/plain, Size: 968 bytes --]
# Large file random writes.
# 104 files, 35MB per file.
time=240
alignio=1
[filesystem0]
location=/var/tmp/fs_bench
num_files=104
min_filesize=36700160 # 35 MB
max_filesize=36700160
reuse=1
[end0]
[threadgroup0]
num_threads=2
write_random=1
write_weight=1
write_size=1048576 # 1 MB
write_blocksize=4096
[stats]
enable_stats=1
enable_range=1
msec_range 0.00 0.01
msec_range 0.01 0.02
msec_range 0.02 0.05
msec_range 0.05 0.10
msec_range 0.10 0.20
msec_range 0.20 0.50
msec_range 0.50 1.00
msec_range 1.00 2.00
msec_range 2.00 5.00
msec_range 5.00 10.00
msec_range 10.00 20.00
msec_range 20.00 50.00
msec_range 50.00 100.00
msec_range 100.00 200.00
msec_range 200.00 500.00
msec_range 500.00 1000.00
msec_range 1000.00 2000.00
msec_range 2000.00 5000.00
msec_range 5000.00 10000.00
[end]
[end0]
next prev parent reply other threads:[~2011-03-02 17:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-01 20:33 libata default FUA support Markus Trippelsdorf
2011-03-02 0:54 ` Robert Hancock
2011-03-02 7:30 ` Michael Tokarev
2011-03-02 8:58 ` Tejun Heo
2011-03-02 17:29 ` Markus Trippelsdorf [this message]
2011-03-03 4:33 ` Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110302172940.GA1644@gentoo.trippels.de \
--to=markus@trippelsdorf.de \
--cc=hancockrwd@gmail.com \
--cc=jgarzik@pobox.com \
--cc=linux-ide@vger.kernel.org \
--cc=mjt@tls.msk.ru \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox