* libata default FUA support
@ 2011-03-01 20:33 Markus Trippelsdorf
2011-03-02 0:54 ` Robert Hancock
0 siblings, 1 reply; 6+ messages in thread
From: Markus Trippelsdorf @ 2011-03-01 20:33 UTC (permalink / raw)
To: Jeff Garzik, linux-ide
FUA support is currently switched off by default in
drivers/ata/libata-core.c.
Given that many modern drives do support FUA now, wouldn't it make sense
to switch it on without setting a (undocumented) kernel/module
parameter?
--
Markus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support
2011-03-01 20:33 libata default FUA support Markus Trippelsdorf
@ 2011-03-02 0:54 ` Robert Hancock
2011-03-02 7:30 ` Michael Tokarev
0 siblings, 1 reply; 6+ messages in thread
From: Robert Hancock @ 2011-03-02 0:54 UTC (permalink / raw)
To: Markus Trippelsdorf; +Cc: Jeff Garzik, linux-ide
On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote:
> FUA support is currently switched off by default in
> drivers/ata/libata-core.c.
> Given that many modern drives do support FUA now, wouldn't it make sense
> to switch it on without setting a (undocumented) kernel/module
> parameter?
I believe I proposed this some time ago. Essentially all modern drives
should support FUA now, since it's part of the definition of the NCQ
(FPDMA) read/write commands. However, as I recall one of the objections
to enabling it was that since it's just a bit in a command, there's a
possibility that some drives may ignore it by accident or design, which
is less likely with an explicit cache flush command. I'm not very
inclined to agree myself (if you go down that road of pre-emptively
predicting drive implementer stupidity, where do you stop?) but that's
what was raised.
Another complication is that NCQ can be disabled at runtime either by
user request or by error-handling fallback, and not all drives that
support NCQ also support the FUA versions of the non-NCQ read/write
commands, so changes in NCQ enable status may also need to result in
changes in FUA support status on the block device.
I believe the way the block layer uses it, basically it only saves the
overhead of one transaction to the drive. It might be significant on
some workloads (especially on high IOPS drives like SSDs) but it's
likely not a huge deal.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support
2011-03-02 0:54 ` Robert Hancock
@ 2011-03-02 7:30 ` Michael Tokarev
2011-03-02 8:58 ` Tejun Heo
2011-03-03 4:33 ` Robert Hancock
0 siblings, 2 replies; 6+ messages in thread
From: Michael Tokarev @ 2011-03-02 7:30 UTC (permalink / raw)
To: Robert Hancock; +Cc: Markus Trippelsdorf, Jeff Garzik, linux-ide
02.03.2011 03:54, Robert Hancock wrote:
> On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote:
>> FUA support is currently switched off by default in
>> drivers/ata/libata-core.c.
>> Given that many modern drives do support FUA now, wouldn't it make sense
>> to switch it on without setting a (undocumented) kernel/module
>> parameter?
After reading your email Markus, I rebooted two my home boxes
after adding libata.fua=1 to the kernel line. And to my surprize,
only one, the oldest, drive from 3 I have supports it. I've
two WDs, one is the famous WD20EARS (first series with "advanced
format", ie, 4kb sectors, and 2Tb size) which is less than half
a year old, and another WD7500AACS, 750Gb, their prev-gen variant,
both "green" series. And another from Hitachi, one of their
"enterprize" series, 500Gb HUA7210, bought about 3 years ago.
>From the 3, only Hitachi reports "supports DPO and FUA" after
rebooting with fua=1.
> I believe I proposed this some time ago. Essentially all modern drives
> should support FUA now, since it's part of the definition of the NCQ
> (FPDMA) read/write commands. However, as I recall one of the objections
> to enabling it was that since it's just a bit in a command, there's a
> possibility that some drives may ignore it by accident or design, which
> is less likely with an explicit cache flush command. I'm not very
> inclined to agree myself (if you go down that road of pre-emptively
> predicting drive implementer stupidity, where do you stop?) but that's
> what was raised.
This is interesting as per above - the WDs I have definitely supports
NCQ, and does that quite well (their scalability is a bit better than
the one from Hitachi), but does not support FUA, or at least linux
treats them as such.
> Another complication is that NCQ can be disabled at runtime either by
> user request or by error-handling fallback, and not all drives that
> support NCQ also support the FUA versions of the non-NCQ read/write
> commands, so changes in NCQ enable status may also need to result in
> changes in FUA support status on the block device.
Well, the only way to find out is to actually try to enable it.
So far, the hitachi drive (which is a main drive on this my
workstation, -- system, development, compilation etc) works
without issues, and kernel compile time reduced for about 2%
(I didn't perform good tests so far, so that 2% may be just
random noize - will take a closer look in a few days to this).
> I believe the way the block layer uses it, basically it only saves the
> overhead of one transaction to the drive. It might be significant on
> some workloads (especially on high IOPS drives like SSDs) but it's
> likely not a huge deal.
One transaction per what? If it means extra, especially "large"
transaction (lile flush with a wait) per each fsync-like call,
that can be huge deal actually, especially on database-like
workloads (lots of small syncronous random writes).
Thanks!
/mjt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support
2011-03-02 7:30 ` Michael Tokarev
@ 2011-03-02 8:58 ` Tejun Heo
2011-03-02 17:29 ` Markus Trippelsdorf
2011-03-03 4:33 ` Robert Hancock
1 sibling, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2011-03-02 8:58 UTC (permalink / raw)
To: Michael Tokarev
Cc: Robert Hancock, Markus Trippelsdorf, Jeff Garzik, linux-ide
On Wed, Mar 02, 2011 at 10:30:57AM +0300, Michael Tokarev wrote:
> > I believe the way the block layer uses it, basically it only saves the
> > overhead of one transaction to the drive. It might be significant on
> > some workloads (especially on high IOPS drives like SSDs) but it's
> > likely not a huge deal.
>
> One transaction per what? If it means extra, especially "large"
> transaction (lile flush with a wait) per each fsync-like call,
> that can be huge deal actually, especially on database-like
> workloads (lots of small syncronous random writes).
The way flushes are used by filesystems is that FUA is usually only
used right after another FLUSH. ie. Using FUA replaces FLUSH + commit
block write + FLUSH sequence to FLUSH + FUA commit block write. Due
to the preceding FLUSH, the cache is already empty, so the only
difference between WRITE + FLUSH and FUA WRITE becomes the extra
command issue overhead which is usually almost unnoticeable compared
to the actual IO.
Another thing is that with the recent updates to block FLUSH handling,
using FUA might even be less efficient. The new implementation
aggressively merges those commit writes and flushes. IOW, depending
on timing, multiple consecutive commit writes can be merged as,
FLUSH + commit writes + FLUSH
or
FLUSH + some commit writes + FLUSH + other commit writes + FLUSH
and so on,
These merges will happen with fsync heavy workloads where FLUSH
performance actually matters and, in these scenarios, FUA writes is
less effective because it puts extra ordering restrictions on each FUA
write. ie. With surrounding FLUSHes, the drive is free to reorder
commit writes to maximize performance, with FUA, the disk has to jump
around all over the place to execute each command in the exact issue
order.
I personally think FUA is a misfeature. It's a microoptimization with
shallow benefits even when used properly while putting much heavier
restriction on actual IO order, which usually is the slow part.
That said, if someone can show FUA actually brings noticeable
performance benefits, sure, let's do it, but till then, I think it
would be best to leave it up in the attic.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support
2011-03-02 8:58 ` Tejun Heo
@ 2011-03-02 17:29 ` Markus Trippelsdorf
0 siblings, 0 replies; 6+ messages in thread
From: Markus Trippelsdorf @ 2011-03-02 17:29 UTC (permalink / raw)
To: Tejun Heo; +Cc: Michael Tokarev, Robert Hancock, Jeff Garzik, linux-ide
[-- Attachment #1: Type: text/plain, Size: 3946 bytes --]
On 2011.03.02 at 09:58 +0100, Tejun Heo wrote:
> On Wed, Mar 02, 2011 at 10:30:57AM +0300, Michael Tokarev wrote:
> > > I believe the way the block layer uses it, basically it only saves the
> > > overhead of one transaction to the drive. It might be significant on
> > > some workloads (especially on high IOPS drives like SSDs) but it's
> > > likely not a huge deal.
> >
> > One transaction per what? If it means extra, especially "large"
> > transaction (lile flush with a wait) per each fsync-like call,
> > that can be huge deal actually, especially on database-like
> > workloads (lots of small syncronous random writes).
>
> The way flushes are used by filesystems is that FUA is usually only
> used right after another FLUSH. ie. Using FUA replaces FLUSH + commit
> block write + FLUSH sequence to FLUSH + FUA commit block write. Due
> to the preceding FLUSH, the cache is already empty, so the only
> difference between WRITE + FLUSH and FUA WRITE becomes the extra
> command issue overhead which is usually almost unnoticeable compared
> to the actual IO.
>
> Another thing is that with the recent updates to block FLUSH handling,
> using FUA might even be less efficient. The new implementation
> aggressively merges those commit writes and flushes. IOW, depending
> on timing, multiple consecutive commit writes can be merged as,
>
> FLUSH + commit writes + FLUSH
>
> or
>
> FLUSH + some commit writes + FLUSH + other commit writes + FLUSH
>
> and so on,
>
> These merges will happen with fsync heavy workloads where FLUSH
> performance actually matters and, in these scenarios, FUA writes is
> less effective because it puts extra ordering restrictions on each FUA
> write. ie. With surrounding FLUSHes, the drive is free to reorder
> commit writes to maximize performance, with FUA, the disk has to jump
> around all over the place to execute each command in the exact issue
> order.
>
> I personally think FUA is a misfeature. It's a microoptimization with
> shallow benefits even when used properly while putting much heavier
> restriction on actual IO order, which usually is the slow part.
Thanks for the detailed information. Just to confirm your point here are
some benchmark results:
(Seagate ST1500DL003 1.5TB 5900rpm, xfs (delaylog), ffsb (
http://sourceforge.net/projects/ffsb/ ) pure random write benchmark:
1)
Total Results 30sec run, 1 thread, 104*35MB files
Op Name Transactions Trans/sec % Trans % Op Weight Throughput
======= ============ ========= ======= =========== ==========
FUA: write : 435456 1183.44 100.000% 100.000% 162MB/sec
no FUA: write : 441600 1243.47 100.000% 100.000% 170MB/sec
System Call Latency statistics in millisecs
Min Avg Max Total Calls
======== ======== ======== ============
[ write]FUA 0.000000 0.070392 5444.638184 435456
[ write]no FUA 0.000000 0.069718 4715.519043 441600
2)
Total Results 240sec run, 2 threads, 104*35MB files
===============
Op Name Transactions Trans/sec % Trans % Op Weight Throughput
======= ============ ========= ======= =========== ==========
FUA: write : 594944 919.45 100.000% 100.000% 126MB/sec
no FUA: write : 653824 1097.31 100.000% 100.000% 150MB/sec
System Call Latency statistics in millisecs
Min Avg Max Total Calls
======== ======== ======== ============
[ write]FUA 0.000000 0.812704 13467.903320 594944
[ write]no FUA 0.000000 0.727761 9695.806641 653824
--
Markus
[-- Attachment #2: random_writes.ffsb --]
[-- Type: text/plain, Size: 968 bytes --]
# Large file random writes.
# 104 files, 35MB per file.
time=240
alignio=1
[filesystem0]
location=/var/tmp/fs_bench
num_files=104
min_filesize=36700160 # 35 MB
max_filesize=36700160
reuse=1
[end0]
[threadgroup0]
num_threads=2
write_random=1
write_weight=1
write_size=1048576 # 1 MB
write_blocksize=4096
[stats]
enable_stats=1
enable_range=1
msec_range 0.00 0.01
msec_range 0.01 0.02
msec_range 0.02 0.05
msec_range 0.05 0.10
msec_range 0.10 0.20
msec_range 0.20 0.50
msec_range 0.50 1.00
msec_range 1.00 2.00
msec_range 2.00 5.00
msec_range 5.00 10.00
msec_range 10.00 20.00
msec_range 20.00 50.00
msec_range 50.00 100.00
msec_range 100.00 200.00
msec_range 200.00 500.00
msec_range 500.00 1000.00
msec_range 1000.00 2000.00
msec_range 2000.00 5000.00
msec_range 5000.00 10000.00
[end]
[end0]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support
2011-03-02 7:30 ` Michael Tokarev
2011-03-02 8:58 ` Tejun Heo
@ 2011-03-03 4:33 ` Robert Hancock
1 sibling, 0 replies; 6+ messages in thread
From: Robert Hancock @ 2011-03-03 4:33 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Markus Trippelsdorf, Jeff Garzik, linux-ide
On Wed, Mar 2, 2011 at 1:30 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> 02.03.2011 03:54, Robert Hancock wrote:
>> On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote:
>>> FUA support is currently switched off by default in
>>> drivers/ata/libata-core.c.
>>> Given that many modern drives do support FUA now, wouldn't it make sense
>>> to switch it on without setting a (undocumented) kernel/module
>>> parameter?
>
> After reading your email Markus, I rebooted two my home boxes
> after adding libata.fua=1 to the kernel line. And to my surprize,
> only one, the oldest, drive from 3 I have supports it. I've
> two WDs, one is the famous WD20EARS (first series with "advanced
> format", ie, 4kb sectors, and 2Tb size) which is less than half
> a year old, and another WD7500AACS, 750Gb, their prev-gen variant,
> both "green" series. And another from Hitachi, one of their
> "enterprize" series, 500Gb HUA7210, bought about 3 years ago.
> From the 3, only Hitachi reports "supports DPO and FUA" after
> rebooting with fua=1.
That only refers to the non-NCQ FUA support. FUA support for NCQ
appears to be mandatory but libata doesn't currently do this (i.e. FUA
is only reported if the drive reports the non-NCQ FUA commands are
supported).
>
>> I believe I proposed this some time ago. Essentially all modern drives
>> should support FUA now, since it's part of the definition of the NCQ
>> (FPDMA) read/write commands. However, as I recall one of the objections
>> to enabling it was that since it's just a bit in a command, there's a
>> possibility that some drives may ignore it by accident or design, which
>> is less likely with an explicit cache flush command. I'm not very
>> inclined to agree myself (if you go down that road of pre-emptively
>> predicting drive implementer stupidity, where do you stop?) but that's
>> what was raised.
>
> This is interesting as per above - the WDs I have definitely supports
> NCQ, and does that quite well (their scalability is a bit better than
> the one from Hitachi), but does not support FUA, or at least linux
> treats them as such.
>
>> Another complication is that NCQ can be disabled at runtime either by
>> user request or by error-handling fallback, and not all drives that
>> support NCQ also support the FUA versions of the non-NCQ read/write
>> commands, so changes in NCQ enable status may also need to result in
>> changes in FUA support status on the block device.
>
> Well, the only way to find out is to actually try to enable it.
> So far, the hitachi drive (which is a main drive on this my
> workstation, -- system, development, compilation etc) works
> without issues, and kernel compile time reduced for about 2%
> (I didn't perform good tests so far, so that 2% may be just
> random noize - will take a closer look in a few days to this).
>
>> I believe the way the block layer uses it, basically it only saves the
>> overhead of one transaction to the drive. It might be significant on
>> some workloads (especially on high IOPS drives like SSDs) but it's
>> likely not a huge deal.
>
> One transaction per what? If it means extra, especially "large"
> transaction (lile flush with a wait) per each fsync-like call,
> that can be huge deal actually, especially on database-like
> workloads (lots of small syncronous random writes).
>
> Thanks!
>
> /mjt
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-03 4:33 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-01 20:33 libata default FUA support Markus Trippelsdorf
2011-03-02 0:54 ` Robert Hancock
2011-03-02 7:30 ` Michael Tokarev
2011-03-02 8:58 ` Tejun Heo
2011-03-02 17:29 ` Markus Trippelsdorf
2011-03-03 4:33 ` Robert Hancock
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.