* problem w/ read caching..
@ 2012-09-12 20:01 Brad Walker
[not found] ` <CAPKZHbV3n7O+VRVNS-C2oDVSpO_VdirMDUOuwwWKaA5ZOUEG_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Brad Walker @ 2012-09-12 20:01 UTC (permalink / raw)
To: linux-bcache-u79uwXL29TY76Z2rM5mHXA
I am having a problem with BCache.
I’ve followed the documentation and have my cache attached. Here is
what dmesg tells me:
[ 372.622905] bcache: invalidating existing data
[ 372.637517] bcache: registered cache device rssda1
[ 400.704672] bcache: Caching dm-2 as bcache0 on set
16fd7139-f018-461c-9d9e-daa
I warmed up the cache by using an application (vdbench) to do random
reads over a 10GB region.
Everything looks good as the response time comes down as the cache
warms up. But, for some reason the cache_hit_rate is showing 90% and
yet I’m still seeing heavy activity to the disk device.
bwalker@nellis:~> cat
/sys/fs/bcache/16fd7139-f018-461c-9d9e-daa7666c7f1e/stats_total/cache_hit_ratio
94
bwalker@nellis:~>
Any ideas on why this might be happening are appreciated.
-brad w.
^ permalink raw reply [flat|nested] 15+ messages in thread[parent not found: <CAPKZHbV3n7O+VRVNS-C2oDVSpO_VdirMDUOuwwWKaA5ZOUEG_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <CAPKZHbV3n7O+VRVNS-C2oDVSpO_VdirMDUOuwwWKaA5ZOUEG_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2012-09-13 18:43 ` Kent Overstreet 2012-09-27 23:28 ` Brad Walker 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2012-09-13 18:43 UTC (permalink / raw) To: Brad Walker; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA Sounds like a good portion of your IO is bypassing the cache. That will happen if some of it's sequential, or if the SSD latency goes over a threshold - sequential_cutoff, congested_read_threshold_us and congest_write_threshold_us (if I'm remembering the names correctly) are the settings that control all that. 0 disables all of them. On Wed, Sep 12, 2012 at 1:01 PM, Brad Walker <bwalker-WlSugiYO8JFBDgjK7y7TUQ@public.gmane.org> wrote: > I am having a problem with BCache. > > I’ve followed the documentation and have my cache attached. Here is > what dmesg tells me: > > [ 372.622905] bcache: invalidating existing data > [ 372.637517] bcache: registered cache device rssda1 > [ 400.704672] bcache: Caching dm-2 as bcache0 on set > 16fd7139-f018-461c-9d9e-daa > > I warmed up the cache by using an application (vdbench) to do random > reads over a 10GB region. > > Everything looks good as the response time comes down as the cache > warms up. But, for some reason the cache_hit_rate is showing 90% and > yet I’m still seeing heavy activity to the disk device. > > bwalker@nellis:~> cat > /sys/fs/bcache/16fd7139-f018-461c-9d9e-daa7666c7f1e/stats_total/cache_hit_ratio > 94 > bwalker@nellis:~> > > Any ideas on why this might be happening are appreciated. > > -brad w. > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: problem w/ read caching.. 2012-09-13 18:43 ` Kent Overstreet @ 2012-09-27 23:28 ` Brad Walker [not found] ` <loom.20120928T010314-562-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Brad Walker @ 2012-09-27 23:28 UTC (permalink / raw) To: linux-bcache-u79uwXL29TY76Z2rM5mHXA Kent Overstreet <koverstreet@...> writes: > > Sounds like a good portion of your IO is bypassing the cache. That > will happen if some of it's sequential, or if the SSD latency goes > over a threshold - sequential_cutoff, congested_read_threshold_us and > congest_write_threshold_us (if I'm remembering the names correctly) > are the settings that control all that. 0 disables all of them. > So I set the sequential_cutoff and congested_read_threshold_us to both be 0. Since I was only doing reads, I figured there was no need to mess with the write option. But, I'm still seeing a problem. My hardware is: 1 - Dell PowerEdge R710 w/ 24 x Xeon processors, 96GB of ram 2 - Micron P320H SSD 3 - LSI storage device connected by a SAS interface What I see is that when I do random reads over a 10GB region, the cache warms up but hits a read response plateau at about 7ms. I still see a LOT (i.e. 32000 IOPS) of I/O to the disk. Yes, if I run the same test over a 1GB region, runs really fast. Pretty close to the max IOPS rate of the SSD. So I'm thinking there is a problem here or I have a bcache config issue. I'm willing to try things but I need some guidance on what to look for as it seems like a bcache issue. Thanks for the help. -brad w. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <loom.20120928T010314-562-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <loom.20120928T010314-562-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> @ 2012-09-28 18:59 ` Kent Overstreet 2012-10-01 19:18 ` Brad Walker 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2012-09-28 18:59 UTC (permalink / raw) To: Brad Walker; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Thu, Sep 27, 2012 at 11:28:20PM +0000, Brad Walker wrote: > Kent Overstreet <koverstreet@...> writes: > > > > > Sounds like a good portion of your IO is bypassing the cache. That > > will happen if some of it's sequential, or if the SSD latency goes > > over a threshold - sequential_cutoff, congested_read_threshold_us and > > congest_write_threshold_us (if I'm remembering the names correctly) > > are the settings that control all that. 0 disables all of them. > > > > So I set the sequential_cutoff and congested_read_threshold_us to both be 0. > Since I was only doing reads, I figured there was no need to mess with the write > option. > > But, I'm still seeing a problem. > > My hardware is: > 1 - Dell PowerEdge R710 w/ 24 x Xeon processors, 96GB of ram > 2 - Micron P320H SSD > 3 - LSI storage device connected by a SAS interface > > What I see is that when I do random reads over a 10GB region, the cache warms up > but hits a read response plateau at about 7ms. I still see a LOT (i.e. 32000 > IOPS) of I/O to the disk. By disk you do mean spinning disk? Or just to the bcache device? I'm wondering if your storage array just is that fast (which would explain the 7 ms) or something weird is going on. Cache hit ratio or iostat would tell you. > Yes, if I run the same test over a 1GB region, runs really fast. Pretty close to > the max IOPS rate of the SSD. > > So I'm thinking there is a problem here or I have a bcache config issue. Sounds like some sort of bcache problem. hrm. Most likely cause is something is keeping the cache from warming up, and some IO is still going to disk. That used to be an issue with the old synchronization for updating the cache on cache miss, but it shouldn't be anymore... Check number of cache misses after a run... if it's going up when all the data should be in the cache, that's one bug. If there's no cache misses and you're still seeing 7 ms latency... well, that would be weird. queueing delays, maybe.. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: problem w/ read caching.. 2012-09-28 18:59 ` Kent Overstreet @ 2012-10-01 19:18 ` Brad Walker [not found] ` <loom.20121001T211315-779-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Brad Walker @ 2012-10-01 19:18 UTC (permalink / raw) To: linux-bcache-u79uwXL29TY76Z2rM5mHXA Kent Overstreet <koverstreet@...> writes: > > By disk you do mean spinning disk? Or just to the bcache device? > > I'm wondering if your storage array just is that fast (which would > explain the 7 ms) or something weird is going on. > > Cache hit ratio or iostat would tell you. The cache_hit_ratio is 99%. And yet, iostat still shows i/o running to the raid array. > > Yes, if I run the same test over a 1GB region, runs really fast. > > Pretty close to the max IOPS rate of the SSD. > > > > So I'm thinking there is a problem here or I have a bcache config issue. > > Sounds like some sort of bcache problem. hrm. > > Most likely cause is something is keeping the cache from warming up, and > some IO is still going to disk. That used to be an issue with the old > synchronization for updating the cache on cache miss, but it shouldn't > be anymore... > > Check number of cache misses after a run... if it's going up when all > the data should be in the cache, that's one bug. If there's no cache > misses and you're still seeing 7 ms latency... well, that would be > weird. queueing delays, maybe.. After running my tests when the cache is fully warmed, the cache_hit_ratio goes to 99%. Yet, cache misses are stable and not changing. Cache hits are increasing and still iostat is showing 32K blocks being read from disk. Any ideas on how to debug this? Thanks for the help. -brad w. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <loom.20121001T211315-779-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <loom.20121001T211315-779-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> @ 2012-10-01 19:38 ` Kent Overstreet 2012-10-01 20:05 ` Brad Walker 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2012-10-01 19:38 UTC (permalink / raw) To: Brad Walker; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Mon, Oct 01, 2012 at 07:18:40PM +0000, Brad Walker wrote: > Kent Overstreet <koverstreet@...> writes: > > > > > By disk you do mean spinning disk? Or just to the bcache device? > > > > I'm wondering if your storage array just is that fast (which would > > explain the 7 ms) or something weird is going on. > > > > Cache hit ratio or iostat would tell you. > > The cache_hit_ratio is 99%. And yet, iostat still shows i/o running to the > raid array. > > > > Yes, if I run the same test over a 1GB region, runs really fast. > > > Pretty close to the max IOPS rate of the SSD. > > > > > > So I'm thinking there is a problem here or I have a bcache config issue. > > > > Sounds like some sort of bcache problem. hrm. > > > > Most likely cause is something is keeping the cache from warming up, and > > some IO is still going to disk. That used to be an issue with the old > > synchronization for updating the cache on cache miss, but it shouldn't > > be anymore... > > > > Check number of cache misses after a run... if it's going up when all > > the data should be in the cache, that's one bug. If there's no cache > > misses and you're still seeing 7 ms latency... well, that would be > > weird. queueing delays, maybe.. > > After running my tests when the cache is fully warmed, the cache_hit_ratio > goes to 99%. > > Yet, cache misses are stable and not changing. Cache hits are increasing and > still iostat is showing 32K blocks being read from disk. > > Any ideas on how to debug this? What about cache_bypass_hits, cache_bypass_misses? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: problem w/ read caching.. 2012-10-01 19:38 ` Kent Overstreet @ 2012-10-01 20:05 ` Brad Walker [not found] ` <loom.20121001T220412-225-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Brad Walker @ 2012-10-01 20:05 UTC (permalink / raw) To: linux-bcache-u79uwXL29TY76Z2rM5mHXA Kent Overstreet <koverstreet@...> writes: > > What about cache_bypass_hits, cache_bypass_misses? > cache_bypass_hits = 0 cache_bypass_misses = 0 ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <loom.20121001T220412-225-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <loom.20121001T220412-225-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> @ 2012-10-01 20:37 ` Kent Overstreet 2012-10-01 20:56 ` Brad Walker 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2012-10-01 20:37 UTC (permalink / raw) To: Brad Walker; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Mon, Oct 01, 2012 at 08:05:14PM +0000, Brad Walker wrote: > Kent Overstreet <koverstreet@...> writes: > > > > > What about cache_bypass_hits, cache_bypass_misses? > > > > cache_bypass_hits = 0 > cache_bypass_misses = 0 I should've just asked you for all the stats - what about cache_miss_collision? Also, internal/cache_read_races? Perhaps stuff is getting evicted from the cache for some reason... How big is the SSD? Is cache_replacement_policy lru? (the default - cache_replacement_policy is in cache0/) What's cache0/priority_stats say? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: problem w/ read caching.. 2012-10-01 20:37 ` Kent Overstreet @ 2012-10-01 20:56 ` Brad Walker [not found] ` <loom.20121001T223817-249-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Brad Walker @ 2012-10-01 20:56 UTC (permalink / raw) To: linux-bcache-u79uwXL29TY76Z2rM5mHXA Kent Overstreet <koverstreet@...> writes: > > On Mon, Oct 01, 2012 at 08:05:14PM +0000, Brad Walker wrote: > > Kent Overstreet <koverstreet@...> writes: > > > > > > > > What about cache_bypass_hits, cache_bypass_misses? > > > > > > > cache_bypass_hits = 0 > > cache_bypass_misses = 0 > > I should've just asked you for all the stats - what about > cache_miss_collision? bwalker:/sys/fs/bcache/dd10d09c-0605-462c-af85-8466b0aa2017/stats_total> ls bypassed cache_bypass_misses cache_hits cache_misses cache_bypass_hits cache_hit_ratio cache_miss_collisions cache_readaheads bwalker:/sys/fs/bcache/dd10d09c-0605-462c-af85-8466b0aa2017/stats_total> cat * 0 0 0 98 162315691 0 2329081 0 bwalker:/sys/fs/bcache/dd10d09c-0605-462c-af85-8466b0aa2017/stats_total> > Also, internal/cache_read_races? cat /sys/fs/bcache/dd10d09c-0605-462c-af85-8466b0aa2017/interna/cache_read_races 0 > Perhaps stuff is getting evicted from the cache for some reason... How > big is the SSD? nellis: # sg_inq /dev/rssda ATA device: model, serial number and firmware revision: Micron P320h-MTFDGAR350SAH 000000001143020287B2 B1490300 nellis: # fdisk /dev/rssda The device presents a logical sector size that is smaller than the physical sector size. Aligning to a physical sector (or optimal I/O) size boundary is recommended, or performance may be impacted. Command (m for help): p Disk /dev/rssda: 350.1 GB, 350078754816 bytes 210 heads, 56 sectors/track, 58141 cylinders, total 683747568 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x16e1596f Device Boot Start End Blocks Id System /dev/rssda1 2048 683747567 341872760 83 Linux Command (m for help): > Is cache_replacement_policy lru? (the default - cache_replacement_policy > is in cache0/) LRU > What's cache0/priority_stats say? > nellis: # cat priority_stats Unused: 97% Metadata: 0% Average: 0 Sectors per Q: 582336 Quantiles: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1] nellis: # ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <loom.20121001T223817-249-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <loom.20121001T223817-249-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> @ 2012-10-01 21:14 ` Kent Overstreet 2012-10-01 22:26 ` Brad Walker 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2012-10-01 21:14 UTC (permalink / raw) To: Brad Walker; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Mon, Oct 01, 2012 at 08:56:21PM +0000, Brad Walker wrote: > Kent Overstreet <koverstreet@...> writes: > > > > > On Mon, Oct 01, 2012 at 08:05:14PM +0000, Brad Walker wrote: > > > Kent Overstreet <koverstreet@...> writes: > > > > > > > > > > > What about cache_bypass_hits, cache_bypass_misses? > > > > > > > > > > cache_bypass_hits = 0 > > > cache_bypass_misses = 0 > > > > I should've just asked you for all the stats - what about > > cache_miss_collision? So cache_miss_collisions, cache_read_races are 0... ---- I was just browsing around the code, and I bet I know what it is - btree_insert_check_key() is failing because the btree node is full. The way the code works is on cache miss, we can't just blindly insert that data into the cache because if a write happens to the same location after the cache miss but before the data from the cache miss gets inserted, we'd overwrite the write with stale data. So btree_insert_check_key() inserts a fake key atomically with the cache miss - we don't need that key to be persisted so we can skip journalling and all the normal btree insert code, which is how we can insert this fake key atomically. Then, on when we go to insert the real key that points to the data from the cache miss, we check if the fake key we inserted is still present and fail the insert if it's not. It's cmpxchg(), but for the btree. Anyways... since we're skipping all the normal btree_insert() code, btree_insert_check_key() can't split the btree node if it's full - if the btree node is full it just fails it. This'd be perfectly fine in any normal workload where you've got some mix of reads and writes... if the btree node is full, a write will come along to split it. But the synthetic workload is a bit of a pathological case here :) But, we should confirm this really is what's going on... Can you apply this patch and rerun to test my theory? See if the number of times the printk fires lines up with the number of cache misses. diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 4102267..d5c5313 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1875,9 +1875,13 @@ bool bch_btree_insert_check_key(struct btree *b, struct btree_op *op, rw_unlock(false, b); rw_lock(true, b, b->level); + if (should_split(b)) { + printk(KERN_DEBUG "bcache: bch_btree_insert_check_key() failed because btree node full\n"); + goto out; + } + if (b->key.ptr[0] != btree_ptr || - b->seq != seq + 1 || - should_split(b)) + b->seq != seq + 1) goto out; op->replace = KEY(op->inode, bio_end(bio), bio_sectors(bio)); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: problem w/ read caching.. 2012-10-01 21:14 ` Kent Overstreet @ 2012-10-01 22:26 ` Brad Walker [not found] ` <loom.20121002T001556-394-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Brad Walker @ 2012-10-01 22:26 UTC (permalink / raw) To: linux-bcache-u79uwXL29TY76Z2rM5mHXA Kent Overstreet <koverstreet@...> writes: > I was just browsing around the code, and I bet I know what it is - > btree_insert_check_key() is failing because the btree node is full. > > But, we should confirm this really is what's going on... Can you apply > this patch and rerun to test my theory? See if the number of times the > printk fires lines up with the number of cache misses. > I applied this change and I see a LOT of the messages. And the rate seems to be increasing. -brad w. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <loom.20121002T001556-394-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <loom.20121002T001556-394-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> @ 2012-10-01 23:00 ` Kent Overstreet [not found] ` <20121001230023.GG26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2012-10-01 23:00 UTC (permalink / raw) To: Brad Walker; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Mon, Oct 01, 2012 at 10:26:43PM +0000, Brad Walker wrote: > Kent Overstreet <koverstreet@...> writes: > > > > I was just browsing around the code, and I bet I know what it is - > > btree_insert_check_key() is failing because the btree node is full. > > > > But, we should confirm this really is what's going on... Can you apply > > this patch and rerun to test my theory? See if the number of times the > > printk fires lines up with the number of cache misses. > > > > > I applied this change and I see a LOT of the messages. > > And the rate seems to be increasing. Sweet, we know what it is then. So, like I mentioned this won't be an issue on any workload with mixed read/writes, so if that's what your production workloads are then this may not matter to you. For warming up the cache, doing a few random writes (just enough that you hit all the btree nodes - and there aren't many btree nodes, cat internel/btree_nodes) will fix it. A real fix for this shouldn't be too hard, but it's not exactly trivial and it'll be a pain to test... not quite sure when I'll get to it, but it would be good to have it fixed. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20121001230023.GG26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <20121001230023.GG26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> @ 2012-10-03 4:44 ` Kent Overstreet 2012-10-08 16:39 ` Brad Walker 0 siblings, 1 reply; 15+ messages in thread From: Kent Overstreet @ 2012-10-03 4:44 UTC (permalink / raw) To: Brad Walker; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA Man, this is a frustrating issue :) I haven't been able to come up with a way of fixing it right without rewriting a bunch of code. I do have a workaround figured out though if this is a real issue for you - it'll just increase internal fragmentation in the btree a bit, but with large btree nodes like you normally want not enough to matter. Though like I said, as long as your workload isn't 100% reads this shouldn't be an isuse. On Mon, Oct 1, 2012 at 4:00 PM, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: > On Mon, Oct 01, 2012 at 10:26:43PM +0000, Brad Walker wrote: >> Kent Overstreet <koverstreet@...> writes: >> >> >> > I was just browsing around the code, and I bet I know what it is - >> > btree_insert_check_key() is failing because the btree node is full. >> > >> > But, we should confirm this really is what's going on... Can you apply >> > this patch and rerun to test my theory? See if the number of times the >> > printk fires lines up with the number of cache misses. >> > >> >> >> I applied this change and I see a LOT of the messages. >> >> And the rate seems to be increasing. > > Sweet, we know what it is then. > > So, like I mentioned this won't be an issue on any workload with mixed > read/writes, so if that's what your production workloads are then this > may not matter to you. > > For warming up the cache, doing a few random writes (just enough that > you hit all the btree nodes - and there aren't many btree nodes, cat > internel/btree_nodes) will fix it. > > A real fix for this shouldn't be too hard, but it's not exactly trivial > and it'll be a pain to test... not quite sure when I'll get to it, but > it would be good to have it fixed. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: problem w/ read caching.. 2012-10-03 4:44 ` Kent Overstreet @ 2012-10-08 16:39 ` Brad Walker 0 siblings, 0 replies; 15+ messages in thread From: Brad Walker @ 2012-10-08 16:39 UTC (permalink / raw) To: linux-bcache-u79uwXL29TY76Z2rM5mHXA Kent Overstreet <koverstreet@...> writes: > > Man, this is a frustrating issue :) > > I haven't been able to come up with a way of fixing it right without > rewriting a bunch of code. I do have a workaround figured out though > if this is a real issue for you - it'll just increase internal > fragmentation in the btree a bit, but with large btree nodes like you > normally want not enough to matter. > > Though like I said, as long as your workload isn't 100% reads this > shouldn't be an isuse. > Sorry to be away for a few days. I can imagine it is frustrating. I've been looking at the code and it seems pretty tough. I think it would be worthwhile to have this fixed if possible as it does benefit small caches (i.e. 1GB in size) as well. Basically, I made my combination of reads verses writes to be 99 to 1 like you said to do. So 99% of the time, I'm doing a read. When I do this, I never see my IOPS rate get over approximately 45K. Also previously, when I run my test over a 1GB region with 100% reads, once the cache warmed, I would see 0.350ms response time for a read. Once I changed the read/write mixture to 99% reads, the response time went down to 0.150ms. So I would think it would be beneficial to small workloads and large workloads if we could fix this. I'm happy to test this for you or help out in any way possible. Thanks. -brad w. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <50651c68.a8e1440a.1165.67c8SMTPIN_ADDED@mx.google.com>]
[parent not found: <50651c68.a8e1440a.1165.67c8SMTPIN_ADDED-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>]
* Re: problem w/ read caching.. [not found] ` <50651c68.a8e1440a.1165.67c8SMTPIN_ADDED-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> @ 2012-09-28 16:31 ` Brad Walker 0 siblings, 0 replies; 15+ messages in thread From: Brad Walker @ 2012-09-28 16:31 UTC (permalink / raw) To: jason-/cow75dQlsI@public.gmane.org; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Thu, Sep 27, 2012 at 9:41 PM, jason-/cow75dQlsI@public.gmane.org <jason-/cow75dQlsI@public.gmane.org> wrote: > One thing that comes to mind for me is your 1gb test may be mostly hitting > the ram cache on the ssd. That is generally where ssd makers, consumer and > to a lesser extent enterprise, get the peak I/O ops numbers from. I understand your thinking. But, in tests using Facebook FlashCache as well as our internally developed caching software (I work for a big storage company), that is not what I see. What I see is that once the cache warms ups, regardless of the size (i.e. 1GB, 10GB, 100GB), that access times are sub-millisecond. > Can you do the same tests, 10gb and 1gb, using only the ssd as your block > device to get a baseline for it without bcache in the picture? Good suggestion! I have done this and the data shows that my SSD has a baseline of 800K iops when using a 4k blocksize. > I would also do some mixed r/w tests just to sort of get a real world > profile for your ssd and system. I'll take a look at doing this over the weekend. > You also don't mention what type and how many spindles you have in the > mechanical array. I'm using a RAID array which has lots of horsepower and the RAID 5 configuration has 6 drives. Each drive is a 15K rpm type. So the array has very good performance. One data point that is when I run Facebook flashcache, once the cache is fully warmed, I see about 0.60 ms response time. Any suggestions? -brad w. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-10-08 16:39 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 20:01 problem w/ read caching Brad Walker
[not found] ` <CAPKZHbV3n7O+VRVNS-C2oDVSpO_VdirMDUOuwwWKaA5ZOUEG_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-13 18:43 ` Kent Overstreet
2012-09-27 23:28 ` Brad Walker
[not found] ` <loom.20120928T010314-562-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-09-28 18:59 ` Kent Overstreet
2012-10-01 19:18 ` Brad Walker
[not found] ` <loom.20121001T211315-779-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 19:38 ` Kent Overstreet
2012-10-01 20:05 ` Brad Walker
[not found] ` <loom.20121001T220412-225-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 20:37 ` Kent Overstreet
2012-10-01 20:56 ` Brad Walker
[not found] ` <loom.20121001T223817-249-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 21:14 ` Kent Overstreet
2012-10-01 22:26 ` Brad Walker
[not found] ` <loom.20121002T001556-394-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2012-10-01 23:00 ` Kent Overstreet
[not found] ` <20121001230023.GG26488-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-03 4:44 ` Kent Overstreet
2012-10-08 16:39 ` Brad Walker
[not found] <50651c68.a8e1440a.1165.67c8SMTPIN_ADDED@mx.google.com>
[not found] ` <50651c68.a8e1440a.1165.67c8SMTPIN_ADDED-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2012-09-28 16:31 ` Brad Walker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).