* libata default FUA support @ 2011-03-01 20:33 Markus Trippelsdorf 2011-03-02 0:54 ` Robert Hancock 0 siblings, 1 reply; 6+ messages in thread From: Markus Trippelsdorf @ 2011-03-01 20:33 UTC (permalink / raw) To: Jeff Garzik, linux-ide FUA support is currently switched off by default in drivers/ata/libata-core.c. Given that many modern drives do support FUA now, wouldn't it make sense to switch it on without setting a (undocumented) kernel/module parameter? -- Markus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support 2011-03-01 20:33 libata default FUA support Markus Trippelsdorf @ 2011-03-02 0:54 ` Robert Hancock 2011-03-02 7:30 ` Michael Tokarev 0 siblings, 1 reply; 6+ messages in thread From: Robert Hancock @ 2011-03-02 0:54 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: Jeff Garzik, linux-ide On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote: > FUA support is currently switched off by default in > drivers/ata/libata-core.c. > Given that many modern drives do support FUA now, wouldn't it make sense > to switch it on without setting a (undocumented) kernel/module > parameter? I believe I proposed this some time ago. Essentially all modern drives should support FUA now, since it's part of the definition of the NCQ (FPDMA) read/write commands. However, as I recall one of the objections to enabling it was that since it's just a bit in a command, there's a possibility that some drives may ignore it by accident or design, which is less likely with an explicit cache flush command. I'm not very inclined to agree myself (if you go down that road of pre-emptively predicting drive implementer stupidity, where do you stop?) but that's what was raised. Another complication is that NCQ can be disabled at runtime either by user request or by error-handling fallback, and not all drives that support NCQ also support the FUA versions of the non-NCQ read/write commands, so changes in NCQ enable status may also need to result in changes in FUA support status on the block device. I believe the way the block layer uses it, basically it only saves the overhead of one transaction to the drive. It might be significant on some workloads (especially on high IOPS drives like SSDs) but it's likely not a huge deal. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support 2011-03-02 0:54 ` Robert Hancock @ 2011-03-02 7:30 ` Michael Tokarev 2011-03-02 8:58 ` Tejun Heo 2011-03-03 4:33 ` Robert Hancock 0 siblings, 2 replies; 6+ messages in thread From: Michael Tokarev @ 2011-03-02 7:30 UTC (permalink / raw) To: Robert Hancock; +Cc: Markus Trippelsdorf, Jeff Garzik, linux-ide 02.03.2011 03:54, Robert Hancock wrote: > On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote: >> FUA support is currently switched off by default in >> drivers/ata/libata-core.c. >> Given that many modern drives do support FUA now, wouldn't it make sense >> to switch it on without setting a (undocumented) kernel/module >> parameter? After reading your email Markus, I rebooted two my home boxes after adding libata.fua=1 to the kernel line. And to my surprize, only one, the oldest, drive from 3 I have supports it. I've two WDs, one is the famous WD20EARS (first series with "advanced format", ie, 4kb sectors, and 2Tb size) which is less than half a year old, and another WD7500AACS, 750Gb, their prev-gen variant, both "green" series. And another from Hitachi, one of their "enterprize" series, 500Gb HUA7210, bought about 3 years ago. >From the 3, only Hitachi reports "supports DPO and FUA" after rebooting with fua=1. > I believe I proposed this some time ago. Essentially all modern drives > should support FUA now, since it's part of the definition of the NCQ > (FPDMA) read/write commands. However, as I recall one of the objections > to enabling it was that since it's just a bit in a command, there's a > possibility that some drives may ignore it by accident or design, which > is less likely with an explicit cache flush command. I'm not very > inclined to agree myself (if you go down that road of pre-emptively > predicting drive implementer stupidity, where do you stop?) but that's > what was raised. This is interesting as per above - the WDs I have definitely supports NCQ, and does that quite well (their scalability is a bit better than the one from Hitachi), but does not support FUA, or at least linux treats them as such. > Another complication is that NCQ can be disabled at runtime either by > user request or by error-handling fallback, and not all drives that > support NCQ also support the FUA versions of the non-NCQ read/write > commands, so changes in NCQ enable status may also need to result in > changes in FUA support status on the block device. Well, the only way to find out is to actually try to enable it. So far, the hitachi drive (which is a main drive on this my workstation, -- system, development, compilation etc) works without issues, and kernel compile time reduced for about 2% (I didn't perform good tests so far, so that 2% may be just random noize - will take a closer look in a few days to this). > I believe the way the block layer uses it, basically it only saves the > overhead of one transaction to the drive. It might be significant on > some workloads (especially on high IOPS drives like SSDs) but it's > likely not a huge deal. One transaction per what? If it means extra, especially "large" transaction (lile flush with a wait) per each fsync-like call, that can be huge deal actually, especially on database-like workloads (lots of small syncronous random writes). Thanks! /mjt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support 2011-03-02 7:30 ` Michael Tokarev @ 2011-03-02 8:58 ` Tejun Heo 2011-03-02 17:29 ` Markus Trippelsdorf 2011-03-03 4:33 ` Robert Hancock 1 sibling, 1 reply; 6+ messages in thread From: Tejun Heo @ 2011-03-02 8:58 UTC (permalink / raw) To: Michael Tokarev Cc: Robert Hancock, Markus Trippelsdorf, Jeff Garzik, linux-ide On Wed, Mar 02, 2011 at 10:30:57AM +0300, Michael Tokarev wrote: > > I believe the way the block layer uses it, basically it only saves the > > overhead of one transaction to the drive. It might be significant on > > some workloads (especially on high IOPS drives like SSDs) but it's > > likely not a huge deal. > > One transaction per what? If it means extra, especially "large" > transaction (lile flush with a wait) per each fsync-like call, > that can be huge deal actually, especially on database-like > workloads (lots of small syncronous random writes). The way flushes are used by filesystems is that FUA is usually only used right after another FLUSH. ie. Using FUA replaces FLUSH + commit block write + FLUSH sequence to FLUSH + FUA commit block write. Due to the preceding FLUSH, the cache is already empty, so the only difference between WRITE + FLUSH and FUA WRITE becomes the extra command issue overhead which is usually almost unnoticeable compared to the actual IO. Another thing is that with the recent updates to block FLUSH handling, using FUA might even be less efficient. The new implementation aggressively merges those commit writes and flushes. IOW, depending on timing, multiple consecutive commit writes can be merged as, FLUSH + commit writes + FLUSH or FLUSH + some commit writes + FLUSH + other commit writes + FLUSH and so on, These merges will happen with fsync heavy workloads where FLUSH performance actually matters and, in these scenarios, FUA writes is less effective because it puts extra ordering restrictions on each FUA write. ie. With surrounding FLUSHes, the drive is free to reorder commit writes to maximize performance, with FUA, the disk has to jump around all over the place to execute each command in the exact issue order. I personally think FUA is a misfeature. It's a microoptimization with shallow benefits even when used properly while putting much heavier restriction on actual IO order, which usually is the slow part. That said, if someone can show FUA actually brings noticeable performance benefits, sure, let's do it, but till then, I think it would be best to leave it up in the attic. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support 2011-03-02 8:58 ` Tejun Heo @ 2011-03-02 17:29 ` Markus Trippelsdorf 0 siblings, 0 replies; 6+ messages in thread From: Markus Trippelsdorf @ 2011-03-02 17:29 UTC (permalink / raw) To: Tejun Heo; +Cc: Michael Tokarev, Robert Hancock, Jeff Garzik, linux-ide [-- Attachment #1: Type: text/plain, Size: 3946 bytes --] On 2011.03.02 at 09:58 +0100, Tejun Heo wrote: > On Wed, Mar 02, 2011 at 10:30:57AM +0300, Michael Tokarev wrote: > > > I believe the way the block layer uses it, basically it only saves the > > > overhead of one transaction to the drive. It might be significant on > > > some workloads (especially on high IOPS drives like SSDs) but it's > > > likely not a huge deal. > > > > One transaction per what? If it means extra, especially "large" > > transaction (lile flush with a wait) per each fsync-like call, > > that can be huge deal actually, especially on database-like > > workloads (lots of small syncronous random writes). > > The way flushes are used by filesystems is that FUA is usually only > used right after another FLUSH. ie. Using FUA replaces FLUSH + commit > block write + FLUSH sequence to FLUSH + FUA commit block write. Due > to the preceding FLUSH, the cache is already empty, so the only > difference between WRITE + FLUSH and FUA WRITE becomes the extra > command issue overhead which is usually almost unnoticeable compared > to the actual IO. > > Another thing is that with the recent updates to block FLUSH handling, > using FUA might even be less efficient. The new implementation > aggressively merges those commit writes and flushes. IOW, depending > on timing, multiple consecutive commit writes can be merged as, > > FLUSH + commit writes + FLUSH > > or > > FLUSH + some commit writes + FLUSH + other commit writes + FLUSH > > and so on, > > These merges will happen with fsync heavy workloads where FLUSH > performance actually matters and, in these scenarios, FUA writes is > less effective because it puts extra ordering restrictions on each FUA > write. ie. With surrounding FLUSHes, the drive is free to reorder > commit writes to maximize performance, with FUA, the disk has to jump > around all over the place to execute each command in the exact issue > order. > > I personally think FUA is a misfeature. It's a microoptimization with > shallow benefits even when used properly while putting much heavier > restriction on actual IO order, which usually is the slow part. Thanks for the detailed information. Just to confirm your point here are some benchmark results: (Seagate ST1500DL003 1.5TB 5900rpm, xfs (delaylog), ffsb ( http://sourceforge.net/projects/ffsb/ ) pure random write benchmark: 1) Total Results 30sec run, 1 thread, 104*35MB files Op Name Transactions Trans/sec % Trans % Op Weight Throughput ======= ============ ========= ======= =========== ========== FUA: write : 435456 1183.44 100.000% 100.000% 162MB/sec no FUA: write : 441600 1243.47 100.000% 100.000% 170MB/sec System Call Latency statistics in millisecs Min Avg Max Total Calls ======== ======== ======== ============ [ write]FUA 0.000000 0.070392 5444.638184 435456 [ write]no FUA 0.000000 0.069718 4715.519043 441600 2) Total Results 240sec run, 2 threads, 104*35MB files =============== Op Name Transactions Trans/sec % Trans % Op Weight Throughput ======= ============ ========= ======= =========== ========== FUA: write : 594944 919.45 100.000% 100.000% 126MB/sec no FUA: write : 653824 1097.31 100.000% 100.000% 150MB/sec System Call Latency statistics in millisecs Min Avg Max Total Calls ======== ======== ======== ============ [ write]FUA 0.000000 0.812704 13467.903320 594944 [ write]no FUA 0.000000 0.727761 9695.806641 653824 -- Markus [-- Attachment #2: random_writes.ffsb --] [-- Type: text/plain, Size: 968 bytes --] # Large file random writes. # 104 files, 35MB per file. time=240 alignio=1 [filesystem0] location=/var/tmp/fs_bench num_files=104 min_filesize=36700160 # 35 MB max_filesize=36700160 reuse=1 [end0] [threadgroup0] num_threads=2 write_random=1 write_weight=1 write_size=1048576 # 1 MB write_blocksize=4096 [stats] enable_stats=1 enable_range=1 msec_range 0.00 0.01 msec_range 0.01 0.02 msec_range 0.02 0.05 msec_range 0.05 0.10 msec_range 0.10 0.20 msec_range 0.20 0.50 msec_range 0.50 1.00 msec_range 1.00 2.00 msec_range 2.00 5.00 msec_range 5.00 10.00 msec_range 10.00 20.00 msec_range 20.00 50.00 msec_range 50.00 100.00 msec_range 100.00 200.00 msec_range 200.00 500.00 msec_range 500.00 1000.00 msec_range 1000.00 2000.00 msec_range 2000.00 5000.00 msec_range 5000.00 10000.00 [end] [end0] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: libata default FUA support 2011-03-02 7:30 ` Michael Tokarev 2011-03-02 8:58 ` Tejun Heo @ 2011-03-03 4:33 ` Robert Hancock 1 sibling, 0 replies; 6+ messages in thread From: Robert Hancock @ 2011-03-03 4:33 UTC (permalink / raw) To: Michael Tokarev; +Cc: Markus Trippelsdorf, Jeff Garzik, linux-ide On Wed, Mar 2, 2011 at 1:30 AM, Michael Tokarev <mjt@tls.msk.ru> wrote: > 02.03.2011 03:54, Robert Hancock wrote: >> On 03/01/2011 02:33 PM, Markus Trippelsdorf wrote: >>> FUA support is currently switched off by default in >>> drivers/ata/libata-core.c. >>> Given that many modern drives do support FUA now, wouldn't it make sense >>> to switch it on without setting a (undocumented) kernel/module >>> parameter? > > After reading your email Markus, I rebooted two my home boxes > after adding libata.fua=1 to the kernel line. And to my surprize, > only one, the oldest, drive from 3 I have supports it. I've > two WDs, one is the famous WD20EARS (first series with "advanced > format", ie, 4kb sectors, and 2Tb size) which is less than half > a year old, and another WD7500AACS, 750Gb, their prev-gen variant, > both "green" series. And another from Hitachi, one of their > "enterprize" series, 500Gb HUA7210, bought about 3 years ago. > From the 3, only Hitachi reports "supports DPO and FUA" after > rebooting with fua=1. That only refers to the non-NCQ FUA support. FUA support for NCQ appears to be mandatory but libata doesn't currently do this (i.e. FUA is only reported if the drive reports the non-NCQ FUA commands are supported). > >> I believe I proposed this some time ago. Essentially all modern drives >> should support FUA now, since it's part of the definition of the NCQ >> (FPDMA) read/write commands. However, as I recall one of the objections >> to enabling it was that since it's just a bit in a command, there's a >> possibility that some drives may ignore it by accident or design, which >> is less likely with an explicit cache flush command. I'm not very >> inclined to agree myself (if you go down that road of pre-emptively >> predicting drive implementer stupidity, where do you stop?) but that's >> what was raised. > > This is interesting as per above - the WDs I have definitely supports > NCQ, and does that quite well (their scalability is a bit better than > the one from Hitachi), but does not support FUA, or at least linux > treats them as such. > >> Another complication is that NCQ can be disabled at runtime either by >> user request or by error-handling fallback, and not all drives that >> support NCQ also support the FUA versions of the non-NCQ read/write >> commands, so changes in NCQ enable status may also need to result in >> changes in FUA support status on the block device. > > Well, the only way to find out is to actually try to enable it. > So far, the hitachi drive (which is a main drive on this my > workstation, -- system, development, compilation etc) works > without issues, and kernel compile time reduced for about 2% > (I didn't perform good tests so far, so that 2% may be just > random noize - will take a closer look in a few days to this). > >> I believe the way the block layer uses it, basically it only saves the >> overhead of one transaction to the drive. It might be significant on >> some workloads (especially on high IOPS drives like SSDs) but it's >> likely not a huge deal. > > One transaction per what? If it means extra, especially "large" > transaction (lile flush with a wait) per each fsync-like call, > that can be huge deal actually, especially on database-like > workloads (lots of small syncronous random writes). > > Thanks! > > /mjt > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-03 4:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-01 20:33 libata default FUA support Markus Trippelsdorf 2011-03-02 0:54 ` Robert Hancock 2011-03-02 7:30 ` Michael Tokarev 2011-03-02 8:58 ` Tejun Heo 2011-03-02 17:29 ` Markus Trippelsdorf 2011-03-03 4:33 ` Robert Hancock
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox