* bcache not working on large files?
@ 2013-10-22 7:08 Dirk Geschke
[not found] ` <20131022070830.GA20286-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Dirk Geschke @ 2013-10-22 7:08 UTC (permalink / raw)
To: linux-bcache-u79uwXL29TY76Z2rM5mHXA
Hi all,
I was just playing a little bit with bcache and it works fine. But
if I try random IOPS writes (writeback) on a file larger than the
cache, it seems not to work? At least I get a performance as without
bcache.
I used a RAID-6 of 8 SSDs (each 238 GiB) as a cache and a RAID-6 of
10 HDDs (2794 Gib). For my test I did random writes of 4k blocks.
With 8 threads writing on one 1000 GB file, I get IOPS in the range
of 20000, but with a file of 10000 GB I end up with about 400 IOPS.
That is the same as without cache.
Did I miss something? Is caching disabled in such cases?
I would expect, that even in this case (writeback and random writes)
it should work as well. Even after a few minutes I get these results,
so it is not a problem of a full cache.
Has anyone a hint for me, what is going wrong?
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| dirk-tpR6ahGJfjwEZ6m2XrtfILNAH6kLmebB@public.gmane.org / dirk-WMiem5eIfR0n5izryJqWLw@public.gmane.org / kontakt-WMiem5eIfR0n5izryJqWLw@public.gmane.org |
+----------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 9+ messages in thread[parent not found: <20131022070830.GA20286-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org>]
* Re: bcache not working on large files? [not found] ` <20131022070830.GA20286-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org> @ 2013-10-22 17:35 ` Rolf Fokkens [not found] ` <5266B774.6040202-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Rolf Fokkens @ 2013-10-22 17:35 UTC (permalink / raw) To: Dirk Geschke, linux-bcache-u79uwXL29TY76Z2rM5mHXA Hi Dirk, On 10/22/2013 09:08 AM, Dirk Geschke wrote: > I was just playing a little bit with bcache and it works fine. But > if I try random IOPS writes (writeback) on a file larger than the > cache, it seems not to work? At least I get a performance as without > bcache. > > Did I miss something? Is caching disabled in such cases? > > Has anyone a hint for me, what is going wrong? Bcache has some specific handling of sequential I/O: http://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt Could this explain what you're seeing? Rolf ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <5266B774.6040202-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>]
* Re: bcache not working on large files? [not found] ` <5266B774.6040202-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org> @ 2013-10-22 18:03 ` Dirk Geschke [not found] ` <20131022180316.GA24473-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Dirk Geschke @ 2013-10-22 18:03 UTC (permalink / raw) To: Rolf Fokkens; +Cc: Dirk Geschke, linux-bcache-u79uwXL29TY76Z2rM5mHXA Hi Rolf, > >I was just playing a little bit with bcache and it works fine. But > >if I try random IOPS writes (writeback) on a file larger than the > >cache, it seems not to work? At least I get a performance as without > >bcache. > > > >Did I miss something? Is caching disabled in such cases? > > > >Has anyone a hint for me, what is going wrong? > Bcache has some specific handling of sequential I/O: > > http://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt > > Could this explain what you're seeing? no, I was explicitly testing random I/O writes of 4k blocks, no sequential writing. With a file of 1000 GB it does work, but if I use a 10000 GB file, it seems to fail. I would expect, that the size should not really matter here, at least until the cache is filled up. The only thing I can imagine is a problem with the RAID controller. Both RAIDs (HDDs and SSDs) are on the same controller. Maybe the controller slows down the SSD cache if it writes to the HDDs? Hmm, maybe I should do two test prarallel, benchmarking random writes of 4k blocks to a 1000 GB file on the SSD RAID and random writes on the HDD RAID. Maybe with random I/O on a 10000 GB file would take too long and slow down the SSD RAID? I will see if I can test it. So maybe it is not an issue of BCache at all... Best regards Dirk -- +----------------------------------------------------------------------+ | Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding | | Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 | | dirk-tpR6ahGJfjwEZ6m2XrtfILNAH6kLmebB@public.gmane.org / dirk-WMiem5eIfR0n5izryJqWLw@public.gmane.org / kontakt-WMiem5eIfR0n5izryJqWLw@public.gmane.org | +----------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20131022180316.GA24473-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org>]
* Re: bcache not working on large files? [not found] ` <20131022180316.GA24473-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org> @ 2013-10-22 21:11 ` matthew patton [not found] ` <1382476284.36900.YahooMailNeo-XYahOdtEMNlRBbKmAC7my5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: matthew patton @ 2013-10-22 21:11 UTC (permalink / raw) To: Dirk Geschke, Rolf Fokkens Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> http://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt > > no, I was explicitly testing random I/O writes of 4k blocks, > no sequential writing. With a file of 1000 GB it does work, but > if I use a 10000 GB file, it seems to fail. I would expect, that > the size should not really matter here, at least until the cache > is filled up. When the SSD gets too slow it will get by-passed by bcache. The tunable is in the document. Though if memory serves it's a latency of 20ms for writes which is probably way too short since SSDs can easily take 1-5 seconds when they have to resort to heavy lifting. What you should do is turn off RAID controller READ caching entirely. And turn OFF writeBACK-caching for the SSD-based LUN(s) at said controller that are being used as bcache caching devices. It would be helpful is you elaborated as to the HBA and SSD drives you are using. BTW doing RAID across the SSDs being used for cache is rather pointless IMO. You're shortening their life, adding unnecessary write IOPs. It's a cache. It's supposed to fail. Bcache will(?) properly handle a busted SSD. Cache is only useful for absorbing sudden spikes in IOPs or for highly localized and frequently re-used blocks. It's not intended to magically improve the underlying storage by 10-100x under a load that can't be sustained by said layer. Big $$$ SANs have lots of cache but at some point you will reach the saturation point and everything slows to HDD speed. I also wouldn't futz with the EXT4 settings you had posted previously while doing benchmark runs because I expect it gets in the way and makes performance worse than if the block stream was less chunky. Only once you have defined a realistic sustained workload and know how often and how fast journaled-to-SSD writes (bcache write-back) can reasonably be de-staged would I revisit those tuning parameters and see if there is any merit to them. Personally I put NVRAM boards in my servers to be filesystem journals and MD mirror maps. They're incredibly cheap. ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1382476284.36900.YahooMailNeo-XYahOdtEMNlRBbKmAC7my5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org>]
* Re: bcache not working on large files? [not found] ` <1382476284.36900.YahooMailNeo-XYahOdtEMNlRBbKmAC7my5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org> @ 2013-10-22 21:45 ` Dirk Geschke [not found] ` <20131022214553.GC26186-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org> 2013-10-22 22:16 ` matthew patton 1 sibling, 1 reply; 9+ messages in thread From: Dirk Geschke @ 2013-10-22 21:45 UTC (permalink / raw) To: matthew patton Cc: Dirk Geschke, linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Hi Matthew, > > no, I was explicitly testing random I/O writes of 4k blocks, > > no sequential writing. With a file of 1000 GB it does work, but > > if I use a 10000 GB file, it seems to fail. I would expect, that > > the size should not really matter here, at least until the cache > > is filled up. > > > When the SSD gets too slow it will get by-passed by bcache. The tunable is in the document. Though if memory serves it's a latency of 20ms for writes which is probably way too short since SSDs can easily take 1-5 seconds when they have to resort to heavy lifting. ah, that could be an explanation for this effect. But I do not find an apropriate parameter in the documentation. Do you have a hint? > What you should do is turn off RAID controller READ caching entirely. And turn OFF writeBACK-caching for the SSD-based LUN(s) at said controller that are being used as bcache caching devices. hmm, yes read cache for SSDs is a kind of overhead. You mean, this may have an impact, too? Actually I am not sure about the settings, it is likely the default... However, it worked with smaller files like 1000 GB. There is no change at all in random I/O but the size of the file I am using. > It would be helpful is you elaborated as to the HBA and SSD drives you are using. BTW doing RAID across the SSDs being used for cache is rather pointless IMO. You're shortening their life, adding unnecessary write IOPs. It's a cache. It's supposed to fail. Bcache will(?) properly handle a busted SSD. The doumentation says no and recommends a RAID. That's the reason why it is used in WriteThrough mode per default. Probably a RAID 5/6 is not a good idea for a Caching SSD-RAID, especially when the HDDs form a RAID 5/6 on the same controller, too. Probably a RAID 10 would be better... > Cache is only useful for absorbing sudden spikes in IOPs or for highly localized and frequently re-used blocks. It's not intended to magically improve the underlying storage by 10-100x under a load that can't be sustained by said layer. Big $$$ SANs have lots of cache but at some point you will reach the saturation point and everything slows to HDD speed. Yes, I was just testing random I/O. And I had expected, that bcache will work with large files, too. So with 8 threads doing random I/O of 4k blocks in a 1000 GB file results in IOPS in the range of about 20,000. The same setup with a file of 10,000 GB results in an IOPS of about 400. That is about the same rate the HDDs have without bcache. This is something I did not expect. It looks like bcache stopped to work at all... Maybe your first comment explains this, so that the SSDs get too lazy. Maybe this is due to the raid controller working with the HDDs. > I also wouldn't futz with the EXT4 settings you had posted previously while doing benchmark runs because I expect it gets in the way and makes performance worse than if the block stream was less chunky. Only once you have defined a realistic sustained workload and know how often and how fast journaled-to-SSD writes (bcache write-back) can reasonably be de-staged would I revisit those tuning parameters and see if there is any merit to them. Oh, that must be another thread, I am using XFS... ;-) > Personally I put NVRAM boards in my servers to be filesystem journals and MD mirror maps. They're incredibly cheap. I am just curious, where the limits are and in which ranges on can use bcache. But I did not expect this result, therefore I asked for a hint. I would have expected, it would work as before. The random writes would entirely fit in the SSD cache even if the HDDs would sleep all the time... And yes, I am not convinced that a RAID 5/6 for SSD caches is really a good idea. Hmm, maybe I can use a RAID 10 instead... Best regards Dirk -- +----------------------------------------------------------------------+ | Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding | | Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 | | dirk-tpR6ahGJfjwEZ6m2XrtfILNAH6kLmebB@public.gmane.org / dirk-WMiem5eIfR0n5izryJqWLw@public.gmane.org / kontakt-WMiem5eIfR0n5izryJqWLw@public.gmane.org | +----------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20131022214553.GC26186-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org>]
* Re: bcache not working on large files? [not found] ` <20131022214553.GC26186-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org> @ 2013-10-22 22:27 ` matthew patton [not found] ` <1382480829.46079.YahooMailNeo-XYahOdtEMNkbWpotXP+qY5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org> 2013-10-23 14:47 ` Dirk Geschke 1 sibling, 1 reply; 9+ messages in thread From: matthew patton @ 2013-10-22 22:27 UTC (permalink / raw) To: Dirk Geschke; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> cache. It's supposed to fail. Bcache will(?) properly handle a busted SSD. > > The doumentation says no and recommends a RAID. That's the reason > why it is used in WriteThrough mode per default. 'busted' in this case means errors on write. If the device disappears and is no longer readable, then sure any un-committed blocks are gone and your filesystem/file is broken. On write failure (how many?) I would sure hope the code-path marks the malfunctioning SSD as disabled and just sends the writes to the HDD. Whether it then gets more aggressive about de-staging previously staged writes I don't know (probably not) but just as long as the SSD is readable the dirty blocks can be written out. If you're not running kernel 3.11.5 (or 3.11.4 with *all* the bcache fixes) you're wasting your time. ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1382480829.46079.YahooMailNeo-XYahOdtEMNkbWpotXP+qY5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org>]
* RE: bcache not working on large files? [not found] ` <1382480829.46079.YahooMailNeo-XYahOdtEMNkbWpotXP+qY5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org> @ 2013-10-23 2:06 ` Paul B. Henson 0 siblings, 0 replies; 9+ messages in thread From: Paul B. Henson @ 2013-10-23 2:06 UTC (permalink / raw) To: 'matthew patton', 'Dirk Geschke' Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA > From: matthew patton > Sent: Tuesday, October 22, 2013 3:27 PM > > 'busted' in this case means errors on write. If the device disappears and is no > longer readable, then sure any un-committed blocks are gone and your > filesystem/file is broken. In other words, if you care about your data, don't use a nonredundant cache device in writeback mode 8-/. I'm tentatively planning to use an md mirror of two 256G SSD's to front end a raid10 of 4 2TB HD's, I'm still in the investigate stage but hopefully will get to prototyping soon. > If you're not running kernel 3.11.5 (or 3.11.4 with *all* the bcache fixes) > you're wasting your time. Is there an intention to port all of these fixes back to an LTS 3.10 kernel, or if you're going to run in production you're pretty much committed to the latest stable? Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bcache not working on large files? [not found] ` <20131022214553.GC26186-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org> 2013-10-22 22:27 ` matthew patton @ 2013-10-23 14:47 ` Dirk Geschke 1 sibling, 0 replies; 9+ messages in thread From: Dirk Geschke @ 2013-10-23 14:47 UTC (permalink / raw) To: linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Hi all, > So with 8 threads doing random I/O of 4k blocks in a 1000 GB file > results in IOPS in the range of about 20,000. The same setup with > a file of 10,000 GB results in an IOPS of about 400. That is about > the same rate the HDDs have without bcache. This is something I did > not expect. It looks like bcache stopped to work at all... I have to correct it: I rebuild the SSD-RAID and used now an RAID 0 instead of RAID 6 for caching. So now I get about 10.000 IOPS with writing 4k blocks . I think this is a) ok and b) it is not a bcache problem I suspect it is a problem of the raid controller, but I have no clue which one... Probably it is a better decision to use a separate raid controller for SSD (b)cache... Best regards Dirk -- +----------------------------------------------------------------------+ | Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding | | Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 | | dirk-tpR6ahGJfjwEZ6m2XrtfILNAH6kLmebB@public.gmane.org / dirk-WMiem5eIfR0n5izryJqWLw@public.gmane.org / kontakt-WMiem5eIfR0n5izryJqWLw@public.gmane.org | +----------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bcache not working on large files? [not found] ` <1382476284.36900.YahooMailNeo-XYahOdtEMNlRBbKmAC7my5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org> 2013-10-22 21:45 ` Dirk Geschke @ 2013-10-22 22:16 ` matthew patton 1 sibling, 0 replies; 9+ messages in thread From: matthew patton @ 2013-10-22 22:16 UTC (permalink / raw) To: Dirk Geschke; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > http://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt > > When the SSD gets too slow it will get by-passed by bcache. The tunable is in > the document. Though if memory serves it's a latency of 20ms for writes > which is probably way too short since SSDs can easily take 1-5 seconds when they > have to resort to heavy lifting. - Traffic's still going to the spindle/still getting cache misses In the real world, SSDs don't always keep up with disks - particularly with slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So you want to avoid being bottlenecked by the SSD and having it slow everything down. To avoid that bcache tracks latency to the cache device, and gradually throttles traffic if the latency exceeds a threshold (it does this by cranking down the sequential bypass). You can disable this if you need to by setting the thresholds to 0: # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-23 14:47 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-22 7:08 bcache not working on large files? Dirk Geschke
[not found] ` <20131022070830.GA20286-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org>
2013-10-22 17:35 ` Rolf Fokkens
[not found] ` <5266B774.6040202-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>
2013-10-22 18:03 ` Dirk Geschke
[not found] ` <20131022180316.GA24473-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org>
2013-10-22 21:11 ` matthew patton
[not found] ` <1382476284.36900.YahooMailNeo-XYahOdtEMNlRBbKmAC7my5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org>
2013-10-22 21:45 ` Dirk Geschke
[not found] ` <20131022214553.GC26186-TGNR7510c4H9eX/OD4jYIkzuTB+3w3uf@public.gmane.org>
2013-10-22 22:27 ` matthew patton
[not found] ` <1382480829.46079.YahooMailNeo-XYahOdtEMNkbWpotXP+qY5OW+3bF1jUfVpNB7YpNyf8@public.gmane.org>
2013-10-23 2:06 ` Paul B. Henson
2013-10-23 14:47 ` Dirk Geschke
2013-10-22 22:16 ` matthew patton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox