* SSD write latency lower than read latency [not found] <298214297.62025.1418642489614.JavaMail.zimbra@thomas-krenn.com> @ 2014-12-15 11:48 ` Georg Schönberger 2014-12-15 15:15 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Georg Schönberger @ 2014-12-15 11:48 UTC (permalink / raw) To: fio Hi Fio users and Jens, I am currently analyzing read and write latencies of Solid State Disks (SSDs). My assumption was that write latencies are usually higher than read latencies. But my tests with Fio on an Intel DCS3500 show that average latency while writing is lower than while reading (using fio-2.1.3): * $ sudo /usr/bin/fio --rw=randrw --name=intelDCS3500 --bs=4k --direct=1 --filename=/dev/sda --rwmixread=0 --numjobs=1 --ioengine=libaio --runtime=60 --iodepth=1 --time_based [...] lat (usec): min=30, max=1010, avg=43.32, stdev= 3.42 [...] * $ sudo /usr/bin/fio --rw=randrw --name=intelDCS3500 --bs=4k --direct=1 --filename=/dev/sda --rwmixread=100 --numjobs=1 --ioengine=libaio --runtime=60 --iodepth=1 --time_based [...] lat (usec): min=103, max=2315, avg=121.92, stdev=11.02 I have made this observation on two different machines with different SSDs, always producing similar results. Am I doing anything wrong with my tests? Are write latencies in general lower than read latencies? One guess I have is a SSD cache to enhance write access (in particular the Intel datacenter SSDs, as they have a "Enhanced power-loss data protection" feature). Thanks a lot, Georg -- : Georg Schönberger : Web Operations & Knowledge Transfer : Thomas-Krenn.AG | The server-experts : http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SSD write latency lower than read latency 2014-12-15 11:48 ` SSD write latency lower than read latency Georg Schönberger @ 2014-12-15 15:15 ` Jens Axboe 2014-12-17 0:49 ` Matthew Eaton 2014-12-17 10:14 ` Erwan Velu 0 siblings, 2 replies; 8+ messages in thread From: Jens Axboe @ 2014-12-15 15:15 UTC (permalink / raw) To: Georg Schönberger, fio On 12/15/2014 04:48 AM, Georg Schönberger wrote: > Hi Fio users and Jens, > > I am currently analyzing read and write latencies of Solid State Disks (SSDs). My assumption was that > write latencies are usually higher than read latencies. But my tests with Fio on an Intel DCS3500 show > that average latency while writing is lower than while reading (using fio-2.1.3): > > * $ sudo /usr/bin/fio --rw=randrw --name=intelDCS3500 --bs=4k --direct=1 --filename=/dev/sda --rwmixread=0 > --numjobs=1 --ioengine=libaio --runtime=60 --iodepth=1 --time_based > [...] > lat (usec): min=30, max=1010, avg=43.32, stdev= 3.42 > [...] > * $ sudo /usr/bin/fio --rw=randrw --name=intelDCS3500 --bs=4k --direct=1 --filename=/dev/sda --rwmixread=100 > --numjobs=1 --ioengine=libaio --runtime=60 --iodepth=1 --time_based > [...] > lat (usec): min=103, max=2315, avg=121.92, stdev=11.02 > > I have made this observation on two different machines with different SSDs, always producing similar results. > Am I doing anything wrong with my tests? > Are write latencies in general lower than read latencies? > > One guess I have is a SSD cache to enhance write access (in particular the Intel datacenter SSDs, as they have > a "Enhanced power-loss data protection" feature). Your guess is exactly right, that's what most flash based devices (worth their salt) do. That's also why sync write latencies are mostly independent of the type of nand used, whereas the read latency will easily reflect that. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SSD write latency lower than read latency 2014-12-15 15:15 ` Jens Axboe @ 2014-12-17 0:49 ` Matthew Eaton 2014-12-17 10:14 ` Erwan Velu 1 sibling, 0 replies; 8+ messages in thread From: Matthew Eaton @ 2014-12-17 0:49 UTC (permalink / raw) To: Jens Axboe; +Cc: Georg Schönberger, fio Just FYI, this is also the case with the DC S3700 based on my testing. On Mon, Dec 15, 2014 at 7:15 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 12/15/2014 04:48 AM, Georg Schönberger wrote: >> >> Hi Fio users and Jens, >> >> I am currently analyzing read and write latencies of Solid State Disks >> (SSDs). My assumption was that >> write latencies are usually higher than read latencies. But my tests with >> Fio on an Intel DCS3500 show >> that average latency while writing is lower than while reading (using >> fio-2.1.3): >> >> * $ sudo /usr/bin/fio --rw=randrw --name=intelDCS3500 --bs=4k --direct=1 >> --filename=/dev/sda --rwmixread=0 >> --numjobs=1 --ioengine=libaio --runtime=60 --iodepth=1 --time_based >> [...] >> lat (usec): min=30, max=1010, avg=43.32, stdev= 3.42 >> [...] >> * $ sudo /usr/bin/fio --rw=randrw --name=intelDCS3500 --bs=4k --direct=1 >> --filename=/dev/sda --rwmixread=100 >> --numjobs=1 --ioengine=libaio --runtime=60 --iodepth=1 --time_based >> [...] >> lat (usec): min=103, max=2315, avg=121.92, stdev=11.02 >> >> I have made this observation on two different machines with different >> SSDs, always producing similar results. >> Am I doing anything wrong with my tests? >> Are write latencies in general lower than read latencies? >> >> One guess I have is a SSD cache to enhance write access (in particular the >> Intel datacenter SSDs, as they have >> a "Enhanced power-loss data protection" feature). > > > Your guess is exactly right, that's what most flash based devices (worth > their salt) do. That's also why sync write latencies are mostly independent > of the type of nand used, whereas the read latency will easily reflect that. > > -- > Jens Axboe > > > -- > To unsubscribe from this list: send the line "unsubscribe fio" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SSD write latency lower than read latency 2014-12-15 15:15 ` Jens Axboe 2014-12-17 0:49 ` Matthew Eaton @ 2014-12-17 10:14 ` Erwan Velu 2014-12-17 15:00 ` Jens Axboe 1 sibling, 1 reply; 8+ messages in thread From: Erwan Velu @ 2014-12-17 10:14 UTC (permalink / raw) To: Jens Axboe, Georg Schönberger, fio Le 15/12/2014 16:15, Jens Axboe a écrit : > Your guess is exactly right, that's what most flash based devices > (worth their salt) do. That's also why sync write latencies are mostly > independent of the type of nand used, whereas the read latency will > easily reflect that. But here the runtime is very limited to 60. I can imagine that if we push the runtime to a longer time, the cache will not be enough to hide the real latency of the device. The cache is said to be 1GB by disassembling the device, maybe if we push the devices with bigger iodepth & a longer run, maybe we can show the performance of the NAND : once the cache is getting new data faster than it can write, the cache will be more occupied, if we can achieve at feeding it completely then we are done. I had the case with a poor MLC (128GB) that had 500MB of SLC cache. On some pattern I was hitting the MLC at 5MB/sec ... Note that in theirs specs, the write latency (65µs) is very close to the read latency (50 µs): http://ark.intel.com/products/75679/Intel-SSD-DC-S3500-Series-160GB-2_5in-SATA-6Gbs-20nm-MLC On the pdf (http://www.intel.fr/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3500-spec.pdf), we also see in the QoS sheet, that writes are said to be slower than reads (up to 10x with iodepth=32). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SSD write latency lower than read latency 2014-12-17 10:14 ` Erwan Velu @ 2014-12-17 15:00 ` Jens Axboe 2014-12-20 8:26 ` Georg Schönberger 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2014-12-17 15:00 UTC (permalink / raw) To: Erwan Velu, Georg Schönberger, fio On 12/17/2014 03:14 AM, Erwan Velu wrote: > > Le 15/12/2014 16:15, Jens Axboe a écrit : >> Your guess is exactly right, that's what most flash based devices >> (worth their salt) do. That's also why sync write latencies are mostly >> independent of the type of nand used, whereas the read latency will >> easily reflect that. > But here the runtime is very limited to 60. I can imagine that if we > push the runtime to a longer time, the cache will not be enough to hide > the real latency of the device. The cache is said to be 1GB by > disassembling the device, maybe if we push the devices with bigger > iodepth & a longer run, maybe we can show the performance of the NAND : > once the cache is getting new data faster than it can write, the cache > will be more occupied, if we can achieve at feeding it completely then > we are done. I had the case with a poor MLC (128GB) that had 500MB of > SLC cache. On some pattern I was hitting the MLC at 5MB/sec ... > > Note that in theirs specs, the write latency (65µs) is very close to the > read latency (50 µs): > http://ark.intel.com/products/75679/Intel-SSD-DC-S3500-Series-160GB-2_5in-SATA-6Gbs-20nm-MLC > > > On the pdf > (http://www.intel.fr/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3500-spec.pdf), > we also see in the QoS sheet, that writes are said to be slower than > reads (up to 10x with iodepth=32). Yes, that's a given, there's a potentially huge difference between the single write sync latency (which can be shaved down to the cost of issue + irq + complete + wakeup), and eg write at steady state where you might have to delay/stall writes if GC can't keep up. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SSD write latency lower than read latency 2014-12-17 15:00 ` Jens Axboe @ 2014-12-20 8:26 ` Georg Schönberger 2014-12-20 16:38 ` Alireza Haghdoost 2014-12-20 19:33 ` Jens Axboe 0 siblings, 2 replies; 8+ messages in thread From: Georg Schönberger @ 2014-12-20 8:26 UTC (permalink / raw) To: fio; +Cc: Erwan Velu, Jens Axboe ----- Original Message ----- > From: "Jens Axboe" <axboe@kernel.dk> > To: "Erwan Velu" <erwan@enovance.com>, "Georg Schönberger" <gschoenberger@thomas-krenn.com>, fio@vger.kernel.org > Sent: Wednesday, 17 December, 2014 4:00:17 PM > Subject: Re: SSD write latency lower than read latency > > On 12/17/2014 03:14 AM, Erwan Velu wrote: > > > > Le 15/12/2014 16:15, Jens Axboe a écrit : > >> Your guess is exactly right, that's what most flash based devices > >> (worth their salt) do. That's also why sync write latencies are mostly > >> independent of the type of nand used, whereas the read latency will > >> easily reflect that. > > But here the runtime is very limited to 60. I can imagine that if we > > push the runtime to a longer time, the cache will not be enough to hide > > the real latency of the device. The cache is said to be 1GB by > > disassembling the device, maybe if we push the devices with bigger > > iodepth & a longer run, maybe we can show the performance of the NAND : > > once the cache is getting new data faster than it can write, the cache > > will be more occupied, if we can achieve at feeding it completely then > > we are done. I had the case with a poor MLC (128GB) that had 500MB of > > SLC cache. On some pattern I was hitting the MLC at 5MB/sec ... > > > > Note that in theirs specs, the write latency (65µs) is very close to the > > read latency (50 µs): > > http://ark.intel.com/products/75679/Intel-SSD-DC-S3500-Series-160GB-2_5in-SATA-6Gbs-20nm-MLC > > > > > > On the pdf > > (http://www.intel.fr/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3500-spec.pdf), > > we also see in the QoS sheet, that writes are said to be slower than > > reads (up to 10x with iodepth=32). > > Yes, that's a given, there's a potentially huge difference between the > single write sync latency (which can be shaved down to the cost of issue > + irq + complete + wakeup), and eg write at steady state where you might > have to delay/stall writes if GC can't keep up. > > Thanks for your confirmation about the write cache, it's always good to know where things come from. According to steady state and GC, I am testing according to the SNIA specification: * http://www.snia.org/sites/default/files/SSS_PTS_Enterprise_v1.0.pdf with TKperf, my report is at * http://www.thomas-krenn.com/de/wikiDE/images/5/52/TKperf-Report-IntelDCS3500.pdf Regarding iodepth, I am using 1 job with 1 outstanding IO - as stated in the specification - to circumvent IO scheduler influences. I thought higher queue depths will always lead to higher latencies, correct? (https://www.kernel.org/doc/Documentation/block/stat.txt) Therefore testing with 1 nj/1 iod will generate comparable latency results, or not? Another question, is there a chance to turn off this cache? It seems it is not the regular device write cache, as I turned it off with "hdparm -W" and latencies seem to produce the same results (just on a quick test). - Georg ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SSD write latency lower than read latency 2014-12-20 8:26 ` Georg Schönberger @ 2014-12-20 16:38 ` Alireza Haghdoost 2014-12-20 19:33 ` Jens Axboe 1 sibling, 0 replies; 8+ messages in thread From: Alireza Haghdoost @ 2014-12-20 16:38 UTC (permalink / raw) To: Georg Schönberger; +Cc: fio@vger.kernel.org, Erwan Velu, Jens Axboe > Regarding iodepth, I am using 1 job with 1 outstanding IO - as stated in the specification - > to circumvent IO scheduler influences. I thought higher queue depths will always lead to > higher latencies, correct? (https://www.kernel.org/doc/Documentation/block/stat.txt) > Therefore testing with 1 nj/1 iod will generate comparable latency results, or not? In the report you have mentioned IO Depth of 16. Do you mean you set 16 for Throughput and IOPS test and then reduce the Depth to 1 for latency test ? While your statement about the impact of queue depth on the latency is make sense to me, it is hard to make a connection between IOPS numbers generated with IO Depth of 16 to Latency numbers generated with IO Depth 1. > Another question, is there a chance to turn off this cache? > It seems it is not the regular device write cache, as I turned it off with "hdparm -W" > and latencies seem to produce the same results (just on a quick test). I believe it is vendor specific, mostly drives don't honor such a requests or even cache flush requests. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SSD write latency lower than read latency 2014-12-20 8:26 ` Georg Schönberger 2014-12-20 16:38 ` Alireza Haghdoost @ 2014-12-20 19:33 ` Jens Axboe 1 sibling, 0 replies; 8+ messages in thread From: Jens Axboe @ 2014-12-20 19:33 UTC (permalink / raw) To: Georg Schönberger, fio; +Cc: Erwan Velu On 12/20/2014 01:26 AM, Georg Schönberger wrote: > ----- Original Message ----- >> From: "Jens Axboe" <axboe@kernel.dk> >> To: "Erwan Velu" <erwan@enovance.com>, "Georg Schönberger" <gschoenberger@thomas-krenn.com>, fio@vger.kernel.org >> Sent: Wednesday, 17 December, 2014 4:00:17 PM >> Subject: Re: SSD write latency lower than read latency >> >> On 12/17/2014 03:14 AM, Erwan Velu wrote: >>> >>> Le 15/12/2014 16:15, Jens Axboe a écrit : >>>> Your guess is exactly right, that's what most flash based devices >>>> (worth their salt) do. That's also why sync write latencies are mostly >>>> independent of the type of nand used, whereas the read latency will >>>> easily reflect that. >>> But here the runtime is very limited to 60. I can imagine that if we >>> push the runtime to a longer time, the cache will not be enough to hide >>> the real latency of the device. The cache is said to be 1GB by >>> disassembling the device, maybe if we push the devices with bigger >>> iodepth & a longer run, maybe we can show the performance of the NAND : >>> once the cache is getting new data faster than it can write, the cache >>> will be more occupied, if we can achieve at feeding it completely then >>> we are done. I had the case with a poor MLC (128GB) that had 500MB of >>> SLC cache. On some pattern I was hitting the MLC at 5MB/sec ... >>> >>> Note that in theirs specs, the write latency (65µs) is very close to the >>> read latency (50 µs): >>> http://ark.intel.com/products/75679/Intel-SSD-DC-S3500-Series-160GB-2_5in-SATA-6Gbs-20nm-MLC >>> >>> >>> On the pdf >>> (http://www.intel.fr/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3500-spec.pdf), >>> we also see in the QoS sheet, that writes are said to be slower than >>> reads (up to 10x with iodepth=32). >> >> Yes, that's a given, there's a potentially huge difference between the >> single write sync latency (which can be shaved down to the cost of issue >> + irq + complete + wakeup), and eg write at steady state where you might >> have to delay/stall writes if GC can't keep up. >> >> > > Thanks for your confirmation about the write cache, it's always good to know where > things come from. According to steady state and GC, I am testing according to the SNIA > specification: > * http://www.snia.org/sites/default/files/SSS_PTS_Enterprise_v1.0.pdf > with TKperf, my report is at > * http://www.thomas-krenn.com/de/wikiDE/images/5/52/TKperf-Report-IntelDCS3500.pdf > > Regarding iodepth, I am using 1 job with 1 outstanding IO - as stated in the specification - > to circumvent IO scheduler influences. I thought higher queue depths will always lead to > higher latencies, correct? (https://www.kernel.org/doc/Documentation/block/stat.txt) > Therefore testing with 1 nj/1 iod will generate comparable latency results, or not? That's not necessarily true. There's a saturation point where using higher depth will cause higher latencies, but until you reach that point, it's not uncommon that you'll decrease latencies slightly by upping the depth. This is due to the fact that you can amortize certain costs across multiple IOs. At some point increasing the queue depth will not make the device go any faster, and at that point, increased latencies are expected. > Another question, is there a chance to turn off this cache? > It seems it is not the regular device write cache, as I turned it off with "hdparm -W" > and latencies seem to produce the same results (just on a quick test). It's not a simple as that. Some devices may utilize a bigger buffer used roughly like a writeback cache on hard drives, these are often that ones that have a larger super cap for power cut safety, enabling the device to keep running for many seconds while the buffer is drained. Others may simply have a smaller page buffer that they stream writes into as part of the design, needing much smaller powercut backing to stream that out to non-volatile flash. The point is that the cache setups can be very different and can be inherently tied to the architecture of the device, so there's generic way to utilize them or to turn them off. Of the devices that have more of a classic bigger write cache, some of them may come with vendor tools that allow you to switch them to write through. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-12-20 19:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <298214297.62025.1418642489614.JavaMail.zimbra@thomas-krenn.com>
2014-12-15 11:48 ` SSD write latency lower than read latency Georg Schönberger
2014-12-15 15:15 ` Jens Axboe
2014-12-17 0:49 ` Matthew Eaton
2014-12-17 10:14 ` Erwan Velu
2014-12-17 15:00 ` Jens Axboe
2014-12-20 8:26 ` Georg Schönberger
2014-12-20 16:38 ` Alireza Haghdoost
2014-12-20 19:33 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox