libata default FUA support

Linux ATA/IDE development
 help / color / mirror / Atom feed

* libata default FUA support
@ 2011-03-01 20:33 Markus Trippelsdorf
  2011-03-02  0:54 ` Robert Hancock
  0 siblings, 1 reply; 6+ messages in thread
From: Markus Trippelsdorf @ 2011-03-01 20:33 UTC (permalink / raw)
  To: Jeff Garzik, linux-ide

FUA support is currently switched off by default in
drivers/ata/libata-core.c. 
Given that many modern drives do support FUA now, wouldn't it make sense
to switch it on without setting a (undocumented) kernel/module
parameter?
-- 
Markus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libata default FUA support
  2011-03-01 20:33 libata default FUA support Markus Trippelsdorf
@ 2011-03-02  0:54 ` Robert Hancock
  2011-03-02  7:30   ` Michael Tokarev
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Hancock @ 2011-03-02  0:54 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Jeff Garzik, linux-ide

On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote:
> FUA support is currently switched off by default in
> drivers/ata/libata-core.c.
> Given that many modern drives do support FUA now, wouldn't it make sense
> to switch it on without setting a (undocumented) kernel/module
> parameter?

I believe I proposed this some time ago. Essentially all modern drives 
should support FUA now, since it's part of the definition of the NCQ 
(FPDMA) read/write commands. However, as I recall one of the objections 
to enabling it was that since it's just a bit in a command, there's a 
possibility that some drives may ignore it by accident or design, which 
is less likely with an explicit cache flush command. I'm not very 
inclined to agree myself (if you go down that road of pre-emptively 
predicting drive implementer stupidity, where do you stop?) but that's 
what was raised.

Another complication is that NCQ can be disabled at runtime either by 
user request or by error-handling fallback, and not all drives that 
support NCQ also support the FUA versions of the non-NCQ read/write 
commands, so changes in NCQ enable status may also need to result in 
changes in FUA support status on the block device.

I believe the way the block layer uses it, basically it only saves the 
overhead of one transaction to the drive. It might be significant on 
some workloads (especially on high IOPS drives like SSDs) but it's 
likely not a huge deal.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libata default FUA support
  2011-03-02  0:54 ` Robert Hancock
@ 2011-03-02  7:30   ` Michael Tokarev
  2011-03-02  8:58     ` Tejun Heo
  2011-03-03  4:33     ` Robert Hancock
  0 siblings, 2 replies; 6+ messages in thread
From: Michael Tokarev @ 2011-03-02  7:30 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Markus Trippelsdorf, Jeff Garzik, linux-ide

02.03.2011 03:54, Robert Hancock wrote:
> On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote:
>> FUA support is currently switched off by default in
>> drivers/ata/libata-core.c.
>> Given that many modern drives do support FUA now, wouldn't it make sense
>> to switch it on without setting a (undocumented) kernel/module
>> parameter?

After reading your email Markus, I rebooted two my home boxes
after adding libata.fua=1 to the kernel line.  And to my surprize,
only one, the oldest, drive from 3 I have supports it.  I've
two WDs, one is the famous WD20EARS (first series with "advanced
format", ie, 4kb sectors, and 2Tb size) which is less than half
a year old, and another WD7500AACS, 750Gb, their prev-gen variant,
both "green" series.  And another from Hitachi, one of their
"enterprize" series, 500Gb HUA7210, bought about 3 years ago.
>From the 3, only Hitachi reports "supports DPO and FUA" after
rebooting with fua=1.

> I believe I proposed this some time ago. Essentially all modern drives
> should support FUA now, since it's part of the definition of the NCQ
> (FPDMA) read/write commands. However, as I recall one of the objections
> to enabling it was that since it's just a bit in a command, there's a
> possibility that some drives may ignore it by accident or design, which
> is less likely with an explicit cache flush command. I'm not very
> inclined to agree myself (if you go down that road of pre-emptively
> predicting drive implementer stupidity, where do you stop?) but that's
> what was raised.

This is interesting as per above - the WDs I have definitely supports
NCQ, and does that quite well (their scalability is a bit better than
the one from Hitachi), but does not support FUA, or at least linux
treats them as such.

> Another complication is that NCQ can be disabled at runtime either by
> user request or by error-handling fallback, and not all drives that
> support NCQ also support the FUA versions of the non-NCQ read/write
> commands, so changes in NCQ enable status may also need to result in
> changes in FUA support status on the block device.

Well, the only way to find out is to actually try to enable it.
So far, the hitachi drive (which is a main drive on this my
workstation, -- system, development, compilation etc) works
without issues, and kernel compile time reduced for about 2%
(I didn't perform good tests so far, so that 2% may be just
random noize - will take a closer look in a few days to this).

> I believe the way the block layer uses it, basically it only saves the
> overhead of one transaction to the drive. It might be significant on
> some workloads (especially on high IOPS drives like SSDs) but it's
> likely not a huge deal.

One transaction per what?  If it means extra, especially "large"
transaction (lile flush with a wait) per each fsync-like call,
that can be huge deal actually, especially on database-like
workloads (lots of small syncronous random writes).

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libata default FUA support
  2011-03-02  7:30   ` Michael Tokarev
@ 2011-03-02  8:58     ` Tejun Heo
  2011-03-02 17:29       ` Markus Trippelsdorf
  2011-03-03  4:33     ` Robert Hancock
  1 sibling, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2011-03-02  8:58 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Robert Hancock, Markus Trippelsdorf, Jeff Garzik, linux-ide

On Wed, Mar 02, 2011 at 10:30:57AM +0300, Michael Tokarev wrote:
> > I believe the way the block layer uses it, basically it only saves the
> > overhead of one transaction to the drive. It might be significant on
> > some workloads (especially on high IOPS drives like SSDs) but it's
> > likely not a huge deal.
> 
> One transaction per what?  If it means extra, especially "large"
> transaction (lile flush with a wait) per each fsync-like call,
> that can be huge deal actually, especially on database-like
> workloads (lots of small syncronous random writes).

The way flushes are used by filesystems is that FUA is usually only
used right after another FLUSH.  ie. Using FUA replaces FLUSH + commit
block write + FLUSH sequence to FLUSH + FUA commit block write.  Due
to the preceding FLUSH, the cache is already empty, so the only
difference between WRITE + FLUSH and FUA WRITE becomes the extra
command issue overhead which is usually almost unnoticeable compared
to the actual IO.

Another thing is that with the recent updates to block FLUSH handling,
using FUA might even be less efficient.  The new implementation
aggressively merges those commit writes and flushes.  IOW, depending
on timing, multiple consecutive commit writes can be merged as,

 FLUSH + commit writes + FLUSH

or

 FLUSH + some commit writes + FLUSH + other commit writes + FLUSH

and so on,

These merges will happen with fsync heavy workloads where FLUSH
performance actually matters and, in these scenarios, FUA writes is
less effective because it puts extra ordering restrictions on each FUA
write.  ie. With surrounding FLUSHes, the drive is free to reorder
commit writes to maximize performance, with FUA, the disk has to jump
around all over the place to execute each command in the exact issue
order.

I personally think FUA is a misfeature.  It's a microoptimization with
shallow benefits even when used properly while putting much heavier
restriction on actual IO order, which usually is the slow part.

That said, if someone can show FUA actually brings noticeable
performance benefits, sure, let's do it, but till then, I think it
would be best to leave it up in the attic.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libata default FUA support
  2011-03-02  8:58     ` Tejun Heo
@ 2011-03-02 17:29       ` Markus Trippelsdorf
  0 siblings, 0 replies; 6+ messages in thread
From: Markus Trippelsdorf @ 2011-03-02 17:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Michael Tokarev, Robert Hancock, Jeff Garzik, linux-ide

[-- Attachment #1: Type: text/plain, Size: 3946 bytes --]

On 2011.03.02 at 09:58 +0100, Tejun Heo wrote:
> On Wed, Mar 02, 2011 at 10:30:57AM +0300, Michael Tokarev wrote:
> > > I believe the way the block layer uses it, basically it only saves the
> > > overhead of one transaction to the drive. It might be significant on
> > > some workloads (especially on high IOPS drives like SSDs) but it's
> > > likely not a huge deal.
> > 
> > One transaction per what?  If it means extra, especially "large"
> > transaction (lile flush with a wait) per each fsync-like call,
> > that can be huge deal actually, especially on database-like
> > workloads (lots of small syncronous random writes).
> 
> The way flushes are used by filesystems is that FUA is usually only
> used right after another FLUSH.  ie. Using FUA replaces FLUSH + commit
> block write + FLUSH sequence to FLUSH + FUA commit block write.  Due
> to the preceding FLUSH, the cache is already empty, so the only
> difference between WRITE + FLUSH and FUA WRITE becomes the extra
> command issue overhead which is usually almost unnoticeable compared
> to the actual IO.
> 
> Another thing is that with the recent updates to block FLUSH handling,
> using FUA might even be less efficient.  The new implementation
> aggressively merges those commit writes and flushes.  IOW, depending
> on timing, multiple consecutive commit writes can be merged as,
> 
>  FLUSH + commit writes + FLUSH
> 
> or
> 
>  FLUSH + some commit writes + FLUSH + other commit writes + FLUSH
> 
> and so on,
> 
> These merges will happen with fsync heavy workloads where FLUSH
> performance actually matters and, in these scenarios, FUA writes is
> less effective because it puts extra ordering restrictions on each FUA
> write.  ie. With surrounding FLUSHes, the drive is free to reorder
> commit writes to maximize performance, with FUA, the disk has to jump
> around all over the place to execute each command in the exact issue
> order.
> 
> I personally think FUA is a misfeature.  It's a microoptimization with
> shallow benefits even when used properly while putting much heavier
> restriction on actual IO order, which usually is the slow part.

Thanks for the detailed information. Just to confirm your point here are
some benchmark results:

(Seagate ST1500DL003 1.5TB 5900rpm, xfs (delaylog), ffsb (
http://sourceforge.net/projects/ffsb/ ) pure random write benchmark:

1)
Total Results 30sec run, 1 thread, 104*35MB files

             Op Name   Transactions      Trans/sec      % Trans     % Op Weight    Throughput
             =======   ============      =========      =======     ===========    ==========
FUA:         write :   435456            1183.44        100.000%    100.000%       162MB/sec
no FUA:      write :   441600            1243.47        100.000%    100.000%       170MB/sec

System Call Latency statistics in millisecs

                Min             Avg             Max             Total Calls
                ========        ========        ========        ============
[  write]FUA    0.000000        0.070392        5444.638184           435456
[  write]no FUA 0.000000        0.069718        4715.519043           441600

2)
Total Results 240sec run, 2 threads, 104*35MB files
===============
             Op Name   Transactions      Trans/sec      % Trans     % Op Weight    Throughput
             =======   ============      =========      =======     ===========    ==========
FUA:         write :   594944            919.45         100.000%    100.000%       126MB/sec
no FUA:      write :   653824            1097.31        100.000%    100.000%       150MB/sec

System Call Latency statistics in millisecs

                Min             Avg             Max             Total Calls
                ========        ========        ========        ============
[  write]FUA    0.000000        0.812704        13467.903320          594944
[  write]no FUA 0.000000        0.727761        9695.806641           653824

-- 
Markus

[-- Attachment #2: random_writes.ffsb --]
[-- Type: text/plain, Size: 968 bytes --]

# Large file random writes.
# 104 files, 35MB per file.

time=240
alignio=1

[filesystem0]
	location=/var/tmp/fs_bench
	num_files=104
	min_filesize=36700160  # 35 MB
	max_filesize=36700160
	reuse=1
[end0]

[threadgroup0]
	num_threads=2

	write_random=1
	write_weight=1

	write_size=1048576  # 1 MB
	write_blocksize=4096

	[stats]
		enable_stats=1
		enable_range=1

		msec_range    0.00      0.01
		msec_range    0.01      0.02
		msec_range    0.02      0.05
		msec_range    0.05      0.10
		msec_range    0.10      0.20
		msec_range    0.20      0.50
		msec_range    0.50      1.00
		msec_range    1.00      2.00
		msec_range    2.00      5.00
		msec_range    5.00     10.00
		msec_range   10.00     20.00
		msec_range   20.00     50.00
		msec_range   50.00    100.00
		msec_range  100.00    200.00
		msec_range  200.00    500.00
		msec_range  500.00   1000.00
		msec_range 1000.00   2000.00
		msec_range 2000.00   5000.00
		msec_range 5000.00  10000.00
	[end]
[end0]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: libata default FUA support
  2011-03-02  7:30   ` Michael Tokarev
  2011-03-02  8:58     ` Tejun Heo
@ 2011-03-03  4:33     ` Robert Hancock
  1 sibling, 0 replies; 6+ messages in thread
From: Robert Hancock @ 2011-03-03  4:33 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Markus Trippelsdorf, Jeff Garzik, linux-ide

On Wed, Mar 2, 2011 at 1:30 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> 02.03.2011 03:54, Robert Hancock wrote:
>> On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote:
>>> FUA support is currently switched off by default in
>>> drivers/ata/libata-core.c.
>>> Given that many modern drives do support FUA now, wouldn't it make sense
>>> to switch it on without setting a (undocumented) kernel/module
>>> parameter?
>
> After reading your email Markus, I rebooted two my home boxes
> after adding libata.fua=1 to the kernel line.  And to my surprize,
> only one, the oldest, drive from 3 I have supports it.  I've
> two WDs, one is the famous WD20EARS (first series with "advanced
> format", ie, 4kb sectors, and 2Tb size) which is less than half
> a year old, and another WD7500AACS, 750Gb, their prev-gen variant,
> both "green" series.  And another from Hitachi, one of their
> "enterprize" series, 500Gb HUA7210, bought about 3 years ago.
> From the 3, only Hitachi reports "supports DPO and FUA" after
> rebooting with fua=1.

That only refers to the non-NCQ FUA support. FUA support for NCQ
appears to be mandatory but libata doesn't currently do this (i.e. FUA
is only reported if the drive reports the non-NCQ FUA commands are
supported).

>
>> I believe I proposed this some time ago. Essentially all modern drives
>> should support FUA now, since it's part of the definition of the NCQ
>> (FPDMA) read/write commands. However, as I recall one of the objections
>> to enabling it was that since it's just a bit in a command, there's a
>> possibility that some drives may ignore it by accident or design, which
>> is less likely with an explicit cache flush command. I'm not very
>> inclined to agree myself (if you go down that road of pre-emptively
>> predicting drive implementer stupidity, where do you stop?) but that's
>> what was raised.
>
> This is interesting as per above - the WDs I have definitely supports
> NCQ, and does that quite well (their scalability is a bit better than
> the one from Hitachi), but does not support FUA, or at least linux
> treats them as such.
>
>> Another complication is that NCQ can be disabled at runtime either by
>> user request or by error-handling fallback, and not all drives that
>> support NCQ also support the FUA versions of the non-NCQ read/write
>> commands, so changes in NCQ enable status may also need to result in
>> changes in FUA support status on the block device.
>
> Well, the only way to find out is to actually try to enable it.
> So far, the hitachi drive (which is a main drive on this my
> workstation, -- system, development, compilation etc) works
> without issues, and kernel compile time reduced for about 2%
> (I didn't perform good tests so far, so that 2% may be just
> random noize - will take a closer look in a few days to this).
>
>> I believe the way the block layer uses it, basically it only saves the
>> overhead of one transaction to the drive. It might be significant on
>> some workloads (especially on high IOPS drives like SSDs) but it's
>> likely not a huge deal.
>
> One transaction per what?  If it means extra, especially "large"
> transaction (lile flush with a wait) per each fsync-like call,
> that can be huge deal actually, especially on database-like
> workloads (lots of small syncronous random writes).
>
> Thanks!
>
> /mjt
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-03-03  4:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-01 20:33 libata default FUA support Markus Trippelsdorf
2011-03-02  0:54 ` Robert Hancock
2011-03-02  7:30   ` Michael Tokarev
2011-03-02  8:58     ` Tejun Heo
2011-03-02 17:29       ` Markus Trippelsdorf
2011-03-03  4:33     ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox