* Quick bcache benchmark
@ 2011-12-06 8:22 ` Kent Overstreet
0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06 8:22 UTC (permalink / raw)
To: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
I've been very remiss in posting benchmarks; this isn't much, but if
anyone has suggestions for what they want I'll see if I can run it.
This is on an old corsair nova - bcache can go something like 10x faster
but this is what I have at home. The profile is still interesting,
though.
The benchmark is 4k random O_DIRECT reads on a 16 gb file, all in cache
- the idea is to push the b+tree.
Also, the backing device is a md raid10 - so that's working, provided
you format your cache with buckets not greater than 1 mb.
root@utumno:/mnt# perf record -afg fio ~/rw4k
randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio 1.59
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [69914K/0K /s] [17.7K/0 iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=1247
read : io=16384MB, bw=68713KB/s, iops=17178 , runt=244169msec
cpu : usr=5.66%, sys=22.93%, ctx=4198688, majf=0, minf=85
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=4194367/0/0, short=0/0/0
Run status group 0 (all jobs):
READ: io=16384MB, aggrb=68712KB/s, minb=70361KB/s, maxb=70361KB/s, mint=244169msec, maxt=244169msec
Disk stats (read/write):
bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
7.74% fio fio [.] 0x1d2ed
5.26% swapper [kernel.kallsyms] [k] ahci_interrupt
3.02% swapper [kernel.kallsyms] [k] mwait_idle
2.52% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.82% swapper [kernel.kallsyms] [k] ahci_scr_read
1.68% fio [kernel.kallsyms] [k] __bset_search <- first bcache function
1.37% fio [kernel.kallsyms] [k] __blockdev_direct_IO
1.36% swapper [kernel.kallsyms] [k] irq_entries_start
1.25% swapper [kernel.kallsyms] [k] mix_pool_bytes_extract
1.06% swapper [kernel.kallsyms] [k] kmem_cache_free
0.94% fio [kernel.kallsyms] [k] __switch_to
0.92% fio [kernel.kallsyms] [k] system_call
0.87% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.84% swapper [kernel.kallsyms] [k] ata_qc_new_init
0.80% fio [kernel.kallsyms] [k] __schedule
0.76% fio [kernel.kallsyms] [k] do_io_submit
0.74% fio [kernel.kallsyms] [k] _raw_spin_lock_irq
0.74% swapper [kernel.kallsyms] [k] _raw_spin_lock
0.73% fio [kernel.kallsyms] [k] ext4_ext_find_extent
0.71% fio [kernel.kallsyms] [k] kmem_cache_alloc
0.70% fio [kernel.kallsyms] [k] read_events
0.65% fio libaio.so.1.0.1 [.] 0x665
0.64% swapper [kernel.kallsyms] [k] __schedule
0.63% swapper [kernel.kallsyms] [k] native_sched_clock
0.63% fio [kernel.kallsyms] [k] btree_search_leaf <- second bcache function
0.62% fio [kernel.kallsyms] [k] bcache_make_request
0.59% swapper [kernel.kallsyms] [k] read_tsc
0.58% fio [kernel.kallsyms] [k] aio_read_evt
0.58% swapper [kernel.kallsyms] [k] select_task_rq_fair
0.57% swapper [kernel.kallsyms] [k] tick_nohz_stop_sched_tick
0.57% fio [kernel.kallsyms] [k] __request_read
0.57% fio [kernel.kallsyms] [k] generic_make_request
0.56% swapper [kernel.kallsyms] [k] sd_prep_fn
0.53% fio [kernel.kallsyms] [k] _raw_spin_lock
0.51% swapper [kernel.kallsyms] [k] __hrtimer_start_range_ns
0.50% fio [kernel.kallsyms] [k] __math_state_restore
Some of the calls to kmem_cache_(free|alloc) are of course from bcache
but it looks to be under 25%.
^ permalink raw reply [flat|nested] 17+ messages in thread* Quick bcache benchmark @ 2011-12-06 8:22 ` Kent Overstreet 0 siblings, 0 replies; 17+ messages in thread From: Kent Overstreet @ 2011-12-06 8:22 UTC (permalink / raw) To: linux-bcache, linux-kernel I've been very remiss in posting benchmarks; this isn't much, but if anyone has suggestions for what they want I'll see if I can run it. This is on an old corsair nova - bcache can go something like 10x faster but this is what I have at home. The profile is still interesting, though. The benchmark is 4k random O_DIRECT reads on a 16 gb file, all in cache - the idea is to push the b+tree. Also, the backing device is a md raid10 - so that's working, provided you format your cache with buckets not greater than 1 mb. root@utumno:/mnt# perf record -afg fio ~/rw4k randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio 1.59 Starting 1 process Jobs: 1 (f=1): [r] [100.0% done] [69914K/0K /s] [17.7K/0 iops] [eta 00m:00s] randwrite: (groupid=0, jobs=1): err= 0: pid=1247 read : io=16384MB, bw=68713KB/s, iops=17178 , runt=244169msec cpu : usr=5.66%, sys=22.93%, ctx=4198688, majf=0, minf=85 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued r/w/d: total=4194367/0/0, short=0/0/0 Run status group 0 (all jobs): READ: io=16384MB, aggrb=68712KB/s, minb=70361KB/s, maxb=70361KB/s, mint=244169msec, maxt=244169msec Disk stats (read/write): bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% 7.74% fio fio [.] 0x1d2ed 5.26% swapper [kernel.kallsyms] [k] ahci_interrupt 3.02% swapper [kernel.kallsyms] [k] mwait_idle 2.52% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave 1.82% swapper [kernel.kallsyms] [k] ahci_scr_read 1.68% fio [kernel.kallsyms] [k] __bset_search <- first bcache function 1.37% fio [kernel.kallsyms] [k] __blockdev_direct_IO 1.36% swapper [kernel.kallsyms] [k] irq_entries_start 1.25% swapper [kernel.kallsyms] [k] mix_pool_bytes_extract 1.06% swapper [kernel.kallsyms] [k] kmem_cache_free 0.94% fio [kernel.kallsyms] [k] __switch_to 0.92% fio [kernel.kallsyms] [k] system_call 0.87% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.84% swapper [kernel.kallsyms] [k] ata_qc_new_init 0.80% fio [kernel.kallsyms] [k] __schedule 0.76% fio [kernel.kallsyms] [k] do_io_submit 0.74% fio [kernel.kallsyms] [k] _raw_spin_lock_irq 0.74% swapper [kernel.kallsyms] [k] _raw_spin_lock 0.73% fio [kernel.kallsyms] [k] ext4_ext_find_extent 0.71% fio [kernel.kallsyms] [k] kmem_cache_alloc 0.70% fio [kernel.kallsyms] [k] read_events 0.65% fio libaio.so.1.0.1 [.] 0x665 0.64% swapper [kernel.kallsyms] [k] __schedule 0.63% swapper [kernel.kallsyms] [k] native_sched_clock 0.63% fio [kernel.kallsyms] [k] btree_search_leaf <- second bcache function 0.62% fio [kernel.kallsyms] [k] bcache_make_request 0.59% swapper [kernel.kallsyms] [k] read_tsc 0.58% fio [kernel.kallsyms] [k] aio_read_evt 0.58% swapper [kernel.kallsyms] [k] select_task_rq_fair 0.57% swapper [kernel.kallsyms] [k] tick_nohz_stop_sched_tick 0.57% fio [kernel.kallsyms] [k] __request_read 0.57% fio [kernel.kallsyms] [k] generic_make_request 0.56% swapper [kernel.kallsyms] [k] sd_prep_fn 0.53% fio [kernel.kallsyms] [k] _raw_spin_lock 0.51% swapper [kernel.kallsyms] [k] __hrtimer_start_range_ns 0.50% fio [kernel.kallsyms] [k] __math_state_restore Some of the calls to kmem_cache_(free|alloc) are of course from bcache but it looks to be under 25%. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw@mail.gmail.com>]
[parent not found: <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-06 11:56 ` Kent Overstreet 2011-12-06 14:10 ` Bostjan Skufca 0 siblings, 1 reply; 17+ messages in thread From: Kent Overstreet @ 2011-12-06 11:56 UTC (permalink / raw) To: Bostjan Skufca; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Tue, Dec 06, 2011 at 11:39:57AM +0100, Bostjan Skufca wrote: > Random write test? Sure. That corsair was giving me /terrible/ write performance, pulled the intel SSD out of my other machine (unregistered the cache from the backing device and attached the new SSD all without unmounting the filesystem :) Write performance with the intel is not /awesome/, but much more reasonable: root@utumno:/mnt# perf record -afg fio ~/rw4k randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio 1.59 Starting 1 process Jobs: 1 (f=1): [w] [100.0% done] [0K/98365K /s] [0 /24.2K iops] [eta 00m:00s] randwrite: (groupid=0, jobs=1): err= 0: pid=1560 write: io=16384MB, bw=89547KB/s, iops=22386 , runt=187359msec cpu : usr=3.94%, sys=14.82%, ctx=300435, majf=0, minf=19 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued r/w/d: total=0/4194367/0, short=0/0/0 Run status group 0 (all jobs): WRITE: io=16384MB, aggrb=89547KB/s, minb=91696KB/s, maxb=91696KB/s, mint=187359msec, maxt=187359msec Disk stats (read/write): bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% 8.97% fio fio [.] 0xd1b2 1.64% fio [kernel.kallsyms] [k] bio_insert <- first bcache function 1.56% fio [kernel.kallsyms] [k] __blockdev_direct_IO 1.24% kworker/1:2 [kernel.kallsyms] [k] __bset_search 1.24% kworker/0:0 [kernel.kallsyms] [k] __bset_search 1.19% swapper [kernel.kallsyms] [k] ahci_interrupt 1.17% kworker/0:1 [kernel.kallsyms] [k] __bset_search 1.17% kworker/1:0 [kernel.kallsyms] [k] __bset_search 1.09% fio [kernel.kallsyms] [k] system_call 1.06% kworker/0:2 [kernel.kallsyms] [k] __bset_search 1.06% kworker/1:1 [kernel.kallsyms] [k] __bset_search 1.04% fio [kernel.kallsyms] [k] ext4_ext_find_extent 0.96% fio [kernel.kallsyms] [k] _raw_spin_lock_irq 0.92% fio [kernel.kallsyms] [k] bcache_make_request 0.87% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.85% swapper [kernel.kallsyms] [k] mwait_idle 0.83% fio [kernel.kallsyms] [k] do_io_submit 0.77% fio [kernel.kallsyms] [k] memset 0.70% fio [kernel.kallsyms] [k] kmem_cache_alloc 0.65% fio [kernel.kallsyms] [k] md5_transform 0.63% fio [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.61% fio [kernel.kallsyms] [k] _raw_spin_lock 0.58% fio [kernel.kallsyms] [k] generic_make_request 0.57% fio libaio.so.1.0.1 [.] 0x6b7 0.50% fio [kernel.kallsyms] [k] gup_pte_range haven't seen bio_insert() show up that high in a profile before, wonder what's up with that.. Reran the random read benchmark with the intel: root@utumno:/mnt# perf record -afg fio ~/rw4k randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio 1.59 Starting 1 process Jobs: 1 (f=1): [r] [100.0% done] [190.9M/0K /s] [47.7K/0 iops] [eta 00m:00s] randwrite: (groupid=0, jobs=1): err= 0: pid=1575 read : io=16384MB, bw=153120KB/s, iops=38279 , runt=109571msec cpu : usr=7.22%, sys=52.15%, ctx=678086, majf=0, minf=85 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued r/w/d: total=4194367/0/0, short=0/0/0 Run status group 0 (all jobs): READ: io=16384MB, aggrb=153119KB/s, minb=156794KB/s, maxb=156794KB/s, mint=109571msec, maxt=109571msec Disk stats (read/write): bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% Basically, whatever hardware you have bcache will easily max it out. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Quick bcache benchmark 2011-12-06 11:56 ` Kent Overstreet @ 2011-12-06 14:10 ` Bostjan Skufca [not found] ` <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Bostjan Skufca @ 2011-12-06 14:10 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA Nice, 22k iops for random write is not bad at all (especially compared to spinning disks:) I have a couple of questions, can you please confirm that I am understanding bcache correctly: 1. When you issue that many random write requests, they get written to SSD first. Then they are slowly propagated from SSD to spinning disk, right? In original order or is the order optimized?2. What about when I unregister bcache from a device? Does it flush changes from SSD to platter?3. Same question (2) for unmounting a drive?4. If machine crashes, will bcache replay changes from SSD to platter at mount time?5. Does it export a number of writes that are pending on SSD, via some /proc or /sys interface?6. Is read cache hot or cold at boot time? I know that is an overkill for wording "couple of questions", sorry:) b. On 6 December 2011 12:56, Kent Overstreet <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > On Tue, Dec 06, 2011 at 11:39:57AM +0100, Bostjan Skufca wrote: > > Random write test? > > Sure. > > That corsair was giving me /terrible/ write performance, pulled the > intel SSD out of my other machine (unregistered the cache from the > backing device and attached the new SSD all without unmounting the > filesystem :) > > Write performance with the intel is not /awesome/, but much more > reasonable: > > root@utumno:/mnt# perf record -afg fio ~/rw4k > randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64 > fio 1.59 > Starting 1 process > Jobs: 1 (f=1): [w] [100.0% done] [0K/98365K /s] [0 /24.2K iops] [eta 00m:00s] > randwrite: (groupid=0, jobs=1): err= 0: pid=1560 > write: io=16384MB, bw=89547KB/s, iops=22386 , runt=187359msec > cpu : usr=3.94%, sys=14.82%, ctx=300435, majf=0, minf=19 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > issued r/w/d: total=0/4194367/0, short=0/0/0 > > > > Run status group 0 (all jobs): > WRITE: io=16384MB, aggrb=89547KB/s, minb=91696KB/s, maxb=91696KB/s, mint=187359msec, maxt=187359msec > > Disk stats (read/write): > bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% > > 8.97% fio fio [.] 0xd1b2 > 1.64% fio [kernel.kallsyms] [k] bio_insert <- first bcache function > 1.56% fio [kernel.kallsyms] [k] __blockdev_direct_IO > 1.24% kworker/1:2 [kernel.kallsyms] [k] __bset_search > 1.24% kworker/0:0 [kernel.kallsyms] [k] __bset_search > 1.19% swapper [kernel.kallsyms] [k] ahci_interrupt > 1.17% kworker/0:1 [kernel.kallsyms] [k] __bset_search > 1.17% kworker/1:0 [kernel.kallsyms] [k] __bset_search > 1.09% fio [kernel.kallsyms] [k] system_call > 1.06% kworker/0:2 [kernel.kallsyms] [k] __bset_search > 1.06% kworker/1:1 [kernel.kallsyms] [k] __bset_search > 1.04% fio [kernel.kallsyms] [k] ext4_ext_find_extent > 0.96% fio [kernel.kallsyms] [k] _raw_spin_lock_irq > 0.92% fio [kernel.kallsyms] [k] bcache_make_request > 0.87% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 0.85% swapper [kernel.kallsyms] [k] mwait_idle > 0.83% fio [kernel.kallsyms] [k] do_io_submit > 0.77% fio [kernel.kallsyms] [k] memset > 0.70% fio [kernel.kallsyms] [k] kmem_cache_alloc > 0.65% fio [kernel.kallsyms] [k] md5_transform > 0.63% fio [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 0.61% fio [kernel.kallsyms] [k] _raw_spin_lock > 0.58% fio [kernel.kallsyms] [k] generic_make_request > 0.57% fio libaio.so.1.0.1 [.] 0x6b7 > 0.50% fio [kernel.kallsyms] [k] gup_pte_range > > haven't seen bio_insert() show up that high in a profile before, wonder > what's up with that.. > > Reran the random read benchmark with the intel: > > root@utumno:/mnt# perf record -afg fio ~/rw4k > randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64 > fio 1.59 > Starting 1 process > Jobs: 1 (f=1): [r] [100.0% done] [190.9M/0K /s] [47.7K/0 iops] [eta 00m:00s] > randwrite: (groupid=0, jobs=1): err= 0: pid=1575 > read : io=16384MB, bw=153120KB/s, iops=38279 , runt=109571msec > cpu : usr=7.22%, sys=52.15%, ctx=678086, majf=0, minf=85 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > issued r/w/d: total=4194367/0/0, short=0/0/0 > > > > Run status group 0 (all jobs): > READ: io=16384MB, aggrb=153119KB/s, minb=156794KB/s, maxb=156794KB/s, mint=109571msec, maxt=109571msec > > Disk stats (read/write): > bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% > > Basically, whatever hardware you have bcache will easily max it out. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-06 17:02 ` Marcus Sorensen [not found] ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw@mail.gmail.com> 0 siblings, 1 reply; 17+ messages in thread From: Marcus Sorensen @ 2011-12-06 17:02 UTC (permalink / raw) To: Bostjan Skufca; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA I'm also curious as to how it decides what to keep in cache and whatto toss out, what to write direct to platter and what to buffer. I'vebeen testing LSI's cachecade 2.0 pro, and my intent is to post somebenchmarks between the two. From what I've seen you get at most 1/2performance of your SSD if everything could fit into cache, I'm notsure if that's due to their algorithm and how they decide what's SSDworthy and what's not. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw@mail.gmail.com>]
[parent not found: <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-09 10:02 ` Kent Overstreet [not found] ` <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Kent Overstreet @ 2011-12-09 10:02 UTC (permalink / raw) To: Marcus Sorensen; +Cc: Bostjan Skufca, linux-bcache-u79uwXL29TY76Z2rM5mHXA Weird. That wouldn't be blocksize - a tiny bucket size could cause performance issues, but not consistent with what you describe. Might be some sort of interaction with xfs, I'll have to see if I can reproduce it. On Thu, Dec 8, 2011 at 6:32 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Got to try this out quickly this afternoon. Used 200GB hardware raid1 > caching for 8 disk, 8T raid 10. Enabled writeback, put xfs on bcache0. > Mkfs.xfs took awhile, which was unusual. I mounted the filesystem, created > an 8GB file, which was fast. Then ran some 512b random reads against it(16 > threads), almost sad speed. Switched same test to random writes, and it was > as slow as spindle. Some of the threads even threw "blocked for 120 seconds" > traces. I wonder if my blocksize is set wrong on the cache, sort of hard to > find the appropriate numbers. > > On Dec 6, 2011 10:02 AM, "Marcus Sorensen" <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >> I'm also curious as to how it decides what to keep in cache and whatto >> toss out, what to write direct to platter and what to buffer. I'vebeen >> testing LSI's cachecade 2.0 pro, and my intent is to post >> somebenchmarks between the two. From what I've seen you get at most >> 1/2performance of your SSD if everything could fit into cache, I'm >> notsure if that's due to their algorithm and how they decide what's >> SSDworthy and what's not. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-09 17:09 ` Marcus Sorensen [not found] ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Marcus Sorensen @ 2011-12-09 17:09 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA Here's some more info. I'm running kernel 3.1.4. When I do random writes, the 'bypassed' number increases in stats. Now I'm random writing direct to /dev/bcache0 and get the same result. The application I'm using to test does the following: 1. looks at size of test file 2. divides size of file by a cmd line specified io size (tried 512b and 4k) and considers that the blockcount for file 3. randomly selects a block number between 0 and blockcount 4. writes a random string of characters of blocksize to specified block 5. repeat 3 and 4 I'll try a few other benchmark tools. [root@sansrv2-10 bcache]# for i in `ls`; do echo -n "$i "; cat $i; done label readahead 0 running 1 sequential_cutoff 4.0M sequential_merge 1 state dirty verify 0 writeback 1 writeback_delay 30 writeback_metadata 1 writeback_percent 0 writeback_running 1 SSD benchmark: [root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/sde WRITE benchmarking against /dev/sde 218880 MB total time: 5.39, time per WRITE request(ms): 0.067 14839.55 total seeks per sec, 927.47 WRITE seeks per sec per thread bcache0 benchmark: [root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/bcache0 WRITE benchmarking against /dev/bcache0 7628799 MB total time: 510.75, time per WRITE request(ms): 6.384 156.63 total seeks per sec, 9.79 WRITE seeks per sec per thread There also seems to be some work needed with clean-up, since I'm unfamiliar with how bcache works I attempted to make-bcache twice, thinking I'd start over. That worked, but because my cache device was already registered I was unable to re-register my newly formatted cache dev, got "kobject_add_internal failed for bcache with -EEXIST, don't try to register things with the same name in the same directory." I was still able to use my cache device via the old uuid, but this will probably cause problems on reboot. Perhaps an unregister file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to see if I could clear /sys/fs/bcache, but no luck. make-bcache should perhaps check for an existing superblock, ask for confirmation, and give some sort instruction on how to unregister, or do it for you if you reformat. On Fri, Dec 9, 2011 at 3:02 AM, Kent Overstreet <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Weird. That wouldn't be blocksize - a tiny bucket size could cause > performance issues, but not consistent with what you describe. > > Might be some sort of interaction with xfs, I'll have to see if I can > reproduce it. > > On Thu, Dec 8, 2011 at 6:32 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Got to try this out quickly this afternoon. Used 200GB hardware raid1 >> caching for 8 disk, 8T raid 10. Enabled writeback, put xfs on bcache0. >> Mkfs.xfs took awhile, which was unusual. I mounted the filesystem, created >> an 8GB file, which was fast. Then ran some 512b random reads against it(16 >> threads), almost sad speed. Switched same test to random writes, and it was >> as slow as spindle. Some of the threads even threw "blocked for 120 seconds" >> traces. I wonder if my blocksize is set wrong on the cache, sort of hard to >> find the appropriate numbers. >> >> On Dec 6, 2011 10:02 AM, "Marcus Sorensen" <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> >>> I'm also curious as to how it decides what to keep in cache and whatto >>> toss out, what to write direct to platter and what to buffer. I'vebeen >>> testing LSI's cachecade 2.0 pro, and my intent is to post >>> somebenchmarks between the two. From what I've seen you get at most >>> 1/2performance of your SSD if everything could fit into cache, I'm >>> notsure if that's due to their algorithm and how they decide what's >>> SSDworthy and what's not. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-09 17:14 ` Marcus Sorensen 2011-12-10 6:33 ` Kent Overstreet 1 sibling, 0 replies; 17+ messages in thread From: Marcus Sorensen @ 2011-12-09 17:14 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA Oh, and here are my stats after running that write benchmark on bcache0. That's pretty much the only thing I've done in these stats. [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; done 2>/dev/null bypassed 605M cache_bypass_hits 333 cache_bypass_misses 77553 cache_hit_ratio 0 cache_hits 85 cache_miss_collisions 9256 cache_misses 10031 cache_readaheads 0 On Fri, Dec 9, 2011 at 10:09 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Here's some more info. I'm running kernel 3.1.4. When I do random > writes, the 'bypassed' number increases in stats. Now I'm random > writing direct to /dev/bcache0 and get the same result. > > The application I'm using to test does the following: > > 1. looks at size of test file > 2. divides size of file by a cmd line specified io size (tried 512b > and 4k) and considers that the blockcount for file > 3. randomly selects a block number between 0 and blockcount > 4. writes a random string of characters of blocksize to specified block > 5. repeat 3 and 4 > > I'll try a few other benchmark tools. > > [root@sansrv2-10 bcache]# for i in `ls`; do echo -n "$i "; cat $i; done > label > readahead 0 > running 1 > sequential_cutoff 4.0M > sequential_merge 1 > state dirty > verify 0 > writeback 1 > writeback_delay 30 > writeback_metadata 1 > writeback_percent 0 > writeback_running 1 > > SSD benchmark: > [root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/sde > > WRITE benchmarking against /dev/sde 218880 MB > > > total time: 5.39, time per WRITE request(ms): 0.067 > 14839.55 total seeks per sec, 927.47 WRITE seeks per sec per thread > > bcache0 benchmark: > > [root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/bcache0 > > WRITE benchmarking against /dev/bcache0 7628799 MB > > > total time: 510.75, time per WRITE request(ms): 6.384 > 156.63 total seeks per sec, 9.79 WRITE seeks per sec per thread > > > There also seems to be some work needed with clean-up, since I'm > unfamiliar with how bcache works I attempted to make-bcache twice, > thinking I'd start over. That worked, but because my cache device was > already registered I was unable to re-register my newly formatted > cache dev, got "kobject_add_internal failed for bcache with -EEXIST, > don't try to register things with the same name in the same > directory." I was still able to use my cache device via the old uuid, > but this will probably cause problems on reboot. Perhaps an unregister > file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to > see if I could clear /sys/fs/bcache, but no luck. make-bcache should > perhaps check for an existing superblock, ask for confirmation, and > give some sort instruction on how to unregister, or do it for you if > you reformat. > > > > > > > > > On Fri, Dec 9, 2011 at 3:02 AM, Kent Overstreet > <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Weird. That wouldn't be blocksize - a tiny bucket size could cause >> performance issues, but not consistent with what you describe. >> >> Might be some sort of interaction with xfs, I'll have to see if I can >> reproduce it. >> >> On Thu, Dec 8, 2011 at 6:32 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> Got to try this out quickly this afternoon. Used 200GB hardware raid1 >>> caching for 8 disk, 8T raid 10. Enabled writeback, put xfs on bcache0. >>> Mkfs.xfs took awhile, which was unusual. I mounted the filesystem, created >>> an 8GB file, which was fast. Then ran some 512b random reads against it(16 >>> threads), almost sad speed. Switched same test to random writes, and it was >>> as slow as spindle. Some of the threads even threw "blocked for 120 seconds" >>> traces. I wonder if my blocksize is set wrong on the cache, sort of hard to >>> find the appropriate numbers. >>> >>> On Dec 6, 2011 10:02 AM, "Marcus Sorensen" <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>> >>>> I'm also curious as to how it decides what to keep in cache and whatto >>>> toss out, what to write direct to platter and what to buffer. I'vebeen >>>> testing LSI's cachecade 2.0 pro, and my intent is to post >>>> somebenchmarks between the two. From what I've seen you get at most >>>> 1/2performance of your SSD if everything could fit into cache, I'm >>>> notsure if that's due to their algorithm and how they decide what's >>>> SSDworthy and what's not. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Quick bcache benchmark [not found] ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-12-09 17:14 ` Marcus Sorensen @ 2011-12-10 6:33 ` Kent Overstreet 2011-12-10 15:02 ` Marcus Sorensen 1 sibling, 1 reply; 17+ messages in thread From: Kent Overstreet @ 2011-12-10 6:33 UTC (permalink / raw) To: Marcus Sorensen; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: > Here's some more info. I'm running kernel 3.1.4. When I do random > writes, the 'bypassed' number increases in stats. Now I'm random > writing direct to /dev/bcache0 and get the same result. Weird. From what you're describing it sounds like throttling is screwed up (and it was recently), but I can't reproduce it now. Can you try echoing 0 to congested_threshold_us in the cache set dir, and seeing if that fixes it? > There also seems to be some work needed with clean-up, since I'm > unfamiliar with how bcache works I attempted to make-bcache twice, > thinking I'd start over. That worked, but because my cache device was > already registered I was unable to re-register my newly formatted > cache dev, got "kobject_add_internal failed for bcache with -EEXIST, > don't try to register things with the same name in the same > directory." I was still able to use my cache device via the old uuid, > but this will probably cause problems on reboot. Perhaps an unregister > file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to > see if I could clear /sys/fs/bcache, but no luck. make-bcache should > perhaps check for an existing superblock, ask for confirmation, and > give some sort instruction on how to unregister, or do it for you if > you reformat. Yeah, I think for some reason bcache isn't opening the devices exclusively on 3.1. I'll have a look... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Quick bcache benchmark 2011-12-10 6:33 ` Kent Overstreet @ 2011-12-10 15:02 ` Marcus Sorensen [not found] ` <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Marcus Sorensen @ 2011-12-10 15:02 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA That keeps the 'bypassed' value from increasing, but it doesn't change write performance. BEFORE: [root@sansrv2-10 stats_day]# cat * 27.6M 83 3500 0 166 24380 40660 0 ...benchmarking... AFTER: [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; > done 2>/dev/null bypassed 27.6M cache_bypass_hits 83 cache_bypass_misses 3500 cache_hit_ratio 0 cache_hits 410 cache_miss_collisions 48879 cache_misses 80545 cache_readaheads 0 /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d average_key_size 0 block_size 2.0k btree_cache_size 3.2M bucket_size 1.0M cache_available_percent 100 clear_stats congested 0 congested_threshold_us 0 dirty_data 0 io_error_halflife 0 io_error_limit 8 root_usage_percent 0 synchronous 1 tree_depth 1 On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >> Here's some more info. I'm running kernel 3.1.4. When I do random >> writes, the 'bypassed' number increases in stats. Now I'm random >> writing direct to /dev/bcache0 and get the same result. > > Weird. From what you're describing it sounds like throttling is screwed > up (and it was recently), but I can't reproduce it now. > > Can you try echoing 0 to congested_threshold_us in the cache set dir, > and seeing if that fixes it? > >> There also seems to be some work needed with clean-up, since I'm >> unfamiliar with how bcache works I attempted to make-bcache twice, >> thinking I'd start over. That worked, but because my cache device was >> already registered I was unable to re-register my newly formatted >> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >> don't try to register things with the same name in the same >> directory." I was still able to use my cache device via the old uuid, >> but this will probably cause problems on reboot. Perhaps an unregister >> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >> perhaps check for an existing superblock, ask for confirmation, and >> give some sort instruction on how to unregister, or do it for you if >> you reformat. > > Yeah, I think for some reason bcache isn't opening the devices > exclusively on 3.1. I'll have a look... ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-15 23:40 ` Marcus Sorensen [not found] ` <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Marcus Sorensen @ 2011-12-15 23:40 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-) On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > That keeps the 'bypassed' value from increasing, but it doesn't change > write performance. > > BEFORE: > [root@sansrv2-10 stats_day]# cat * > 27.6M > 83 > 3500 > 0 > 166 > 24380 > 40660 > 0 > > ...benchmarking... > > AFTER: > > [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; >> done 2>/dev/null > bypassed 27.6M > cache_bypass_hits 83 > cache_bypass_misses 3500 > cache_hit_ratio 0 > cache_hits 410 > cache_miss_collisions 48879 > cache_misses 80545 > cache_readaheads 0 > > /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d > > average_key_size 0 > block_size 2.0k > btree_cache_size 3.2M > bucket_size 1.0M > cache_available_percent 100 > clear_stats congested 0 > congested_threshold_us 0 > dirty_data 0 > io_error_halflife 0 > io_error_limit 8 > root_usage_percent 0 > synchronous 1 > tree_depth 1 > > > On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet > <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >>> Here's some more info. I'm running kernel 3.1.4. When I do random >>> writes, the 'bypassed' number increases in stats. Now I'm random >>> writing direct to /dev/bcache0 and get the same result. >> >> Weird. From what you're describing it sounds like throttling is screwed >> up (and it was recently), but I can't reproduce it now. >> >> Can you try echoing 0 to congested_threshold_us in the cache set dir, >> and seeing if that fixes it? >> >>> There also seems to be some work needed with clean-up, since I'm >>> unfamiliar with how bcache works I attempted to make-bcache twice, >>> thinking I'd start over. That worked, but because my cache device was >>> already registered I was unable to re-register my newly formatted >>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >>> don't try to register things with the same name in the same >>> directory." I was still able to use my cache device via the old uuid, >>> but this will probably cause problems on reboot. Perhaps an unregister >>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >>> perhaps check for an existing superblock, ask for confirmation, and >>> give some sort instruction on how to unregister, or do it for you if >>> you reformat. >> >> Yeah, I think for some reason bcache isn't opening the devices >> exclusively on 3.1. I'll have a look... ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-16 2:17 ` Kent Overstreet [not found] ` <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Kent Overstreet @ 2011-12-16 2:17 UTC (permalink / raw) To: Marcus Sorensen; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA Sorry, I was thinking about that issue for awhile and then I got distracted... It's not user error, it's an irritating corner case. Basically, it's the result of a workaround for a particularly obscure data corruption bug. If a write bypasses the cache, it has to invalidate that region of the cache; the null key it leaves in the cache will block cache misses from adding that data to the cache until the btree node fills up (and possibly splits). It hasn't been an issue for us in normal operation, but when you're just testing - i.e. you don't have much load - that node split may not happen for a long time, and so if for some reason a bunch of data bypassed the cache... well, you see what happens. Unfortunately a better solution to the original race is not going to be simple, so it's probably not going to be done in the very near future. It's a _very_ difficult race to hit, but in the meantime I'd rather lose performance than corrupt data. But the good news is if you put normal server-ish load on it the issue should go away in steady state operation. On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-) > > On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> That keeps the 'bypassed' value from increasing, but it doesn't change >> write performance. >> >> BEFORE: >> [root@sansrv2-10 stats_day]# cat * >> 27.6M >> 83 >> 3500 >> 0 >> 166 >> 24380 >> 40660 >> 0 >> >> ...benchmarking... >> >> AFTER: >> >> [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; >>> done 2>/dev/null >> bypassed 27.6M >> cache_bypass_hits 83 >> cache_bypass_misses 3500 >> cache_hit_ratio 0 >> cache_hits 410 >> cache_miss_collisions 48879 >> cache_misses 80545 >> cache_readaheads 0 >> >> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d >> >> average_key_size 0 >> block_size 2.0k >> btree_cache_size 3.2M >> bucket_size 1.0M >> cache_available_percent 100 >> clear_stats congested 0 >> congested_threshold_us 0 >> dirty_data 0 >> io_error_halflife 0 >> io_error_limit 8 >> root_usage_percent 0 >> synchronous 1 >> tree_depth 1 >> >> >> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet >> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >>>> Here's some more info. I'm running kernel 3.1.4. When I do random >>>> writes, the 'bypassed' number increases in stats. Now I'm random >>>> writing direct to /dev/bcache0 and get the same result. >>> >>> Weird. From what you're describing it sounds like throttling is screwed >>> up (and it was recently), but I can't reproduce it now. >>> >>> Can you try echoing 0 to congested_threshold_us in the cache set dir, >>> and seeing if that fixes it? >>> >>>> There also seems to be some work needed with clean-up, since I'm >>>> unfamiliar with how bcache works I attempted to make-bcache twice, >>>> thinking I'd start over. That worked, but because my cache device was >>>> already registered I was unable to re-register my newly formatted >>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >>>> don't try to register things with the same name in the same >>>> directory." I was still able to use my cache device via the old uuid, >>>> but this will probably cause problems on reboot. Perhaps an unregister >>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >>>> perhaps check for an existing superblock, ask for confirmation, and >>>> give some sort instruction on how to unregister, or do it for you if >>>> you reformat. >>> >>> Yeah, I think for some reason bcache isn't opening the devices >>> exclusively on 3.1. I'll have a look... > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-16 4:28 ` Marcus Sorensen [not found] ` <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Marcus Sorensen @ 2011-12-16 4:28 UTC (permalink / raw) To: Kent Overstreet; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA Thanks! I'll put it through some more tests. I kind of figured that something more real-world would help. On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet-hpIqsD4AKldhl2p70BpVqQ@public.gmane.orgm> wrote: > Sorry, I was thinking about that issue for awhile and then I got distracted... > > It's not user error, it's an irritating corner case. Basically, it's > the result of a workaround for a particularly obscure data corruption > bug. > > If a write bypasses the cache, it has to invalidate that region of the > cache; the null key it leaves in the cache will block cache misses > from adding that data to the cache until the btree node fills up (and > possibly splits). > > It hasn't been an issue for us in normal operation, but when you're > just testing - i.e. you don't have much load - that node split may not > happen for a long time, and so if for some reason a bunch of data > bypassed the cache... well, you see what happens. > > Unfortunately a better solution to the original race is not going to > be simple, so it's probably not going to be done in the very near > future. It's a _very_ difficult race to hit, but in the meantime I'd > rather lose performance than corrupt data. > > But the good news is if you put normal server-ish load on it the issue > should go away in steady state operation. > > On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-) >> >> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote: >>> That keeps the 'bypassed' value from increasing, but it doesn't change >>> write performance. >>> >>> BEFORE: >>> [root@sansrv2-10 stats_day]# cat * >>> 27.6M >>> 83 >>> 3500 >>> 0 >>> 166 >>> 24380 >>> 40660 >>> 0 >>> >>> ...benchmarking... >>> >>> AFTER: >>> >>> [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; >>>> done 2>/dev/null >>> bypassed 27.6M >>> cache_bypass_hits 83 >>> cache_bypass_misses 3500 >>> cache_hit_ratio 0 >>> cache_hits 410 >>> cache_miss_collisions 48879 >>> cache_misses 80545 >>> cache_readaheads 0 >>> >>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d >>> >>> average_key_size 0 >>> block_size 2.0k >>> btree_cache_size 3.2M >>> bucket_size 1.0M >>> cache_available_percent 100 >>> clear_stats congested 0 >>> congested_threshold_us 0 >>> dirty_data 0 >>> io_error_halflife 0 >>> io_error_limit 8 >>> root_usage_percent 0 >>> synchronous 1 >>> tree_depth 1 >>> >>> >>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet >>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >>>>> Here's some more info. I'm running kernel 3.1.4. When I do random >>>>> writes, the 'bypassed' number increases in stats. Now I'm random >>>>> writing direct to /dev/bcache0 and get the same result. >>>> >>>> Weird. From what you're describing it sounds like throttling is screwed >>>> up (and it was recently), but I can't reproduce it now. >>>> >>>> Can you try echoing 0 to congested_threshold_us in the cache set dir, >>>> and seeing if that fixes it? >>>> >>>>> There also seems to be some work needed with clean-up, since I'm >>>>> unfamiliar with how bcache works I attempted to make-bcache twice, >>>>> thinking I'd start over. That worked, but because my cache device was >>>>> already registered I was unable to re-register my newly formatted >>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >>>>> don't try to register things with the same name in the same >>>>> directory." I was still able to use my cache device via the old uuid, >>>>> but this will probably cause problems on reboot. Perhaps an unregister >>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >>>>> perhaps check for an existing superblock, ask for confirmation, and >>>>> give some sort instruction on how to unregister, or do it for you if >>>>> you reformat. >>>> >>>> Yeah, I think for some reason bcache isn't opening the devices >>>> exclusively on 3.1. I'll have a look... >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-16 18:49 ` Marcus Sorensen [not found] ` <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Marcus Sorensen @ 2011-12-16 18:49 UTC (permalink / raw) To: Kent Overstreet; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA Actually I think this IS user error. I ran a benchmark with FIO, and the results were practically identical with and without bcache. I applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it applied cleanly I'm guessing that wiped something out. Here are my stats after running the benchmark on bcache, and also included is the fio config. bypassed 32.1G cache_bypass_hits 5482 cache_bypass_misses 194862 cache_hit_ratio 3 cache_hits 786 cache_miss_collisions 206 cache_misses 19447 cache_readaheads 0 [global] ioengine=libaio iodepth=4 invalidate=1 #make sure we're not cached locally direct=1 #don't use buffers during test (test without local caches) thread ramp_time=20 time_based runtime=180 [8RandomReadWriters] rw=randrw numjobs=8 blocksize=4k size=1G [2SequentialReadWriters] rw=rw numjobs=2 size=4G blocksize_range=64k-1M On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Thanks! I'll put it through some more tests. I kind of figured that > something more real-world would help. > > On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote: >> Sorry, I was thinking about that issue for awhile and then I got distracted... >> >> It's not user error, it's an irritating corner case. Basically, it's >> the result of a workaround for a particularly obscure data corruption >> bug. >> >> If a write bypasses the cache, it has to invalidate that region of the >> cache; the null key it leaves in the cache will block cache misses >> from adding that data to the cache until the btree node fills up (and >> possibly splits). >> >> It hasn't been an issue for us in normal operation, but when you're >> just testing - i.e. you don't have much load - that node split may not >> happen for a long time, and so if for some reason a bunch of data >> bypassed the cache... well, you see what happens. >> >> Unfortunately a better solution to the original race is not going to >> be simple, so it's probably not going to be done in the very near >> future. It's a _very_ difficult race to hit, but in the meantime I'd >> rather lose performance than corrupt data. >> >> But the good news is if you put normal server-ish load on it the issue >> should go away in steady state operation. >> >> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote: >>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-) >>> >>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>> That keeps the 'bypassed' value from increasing, but it doesn't change >>>> write performance. >>>> >>>> BEFORE: >>>> [root@sansrv2-10 stats_day]# cat * >>>> 27.6M >>>> 83 >>>> 3500 >>>> 0 >>>> 166 >>>> 24380 >>>> 40660 >>>> 0 >>>> >>>> ...benchmarking... >>>> >>>> AFTER: >>>> >>>> [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; >>>>> done 2>/dev/null >>>> bypassed 27.6M >>>> cache_bypass_hits 83 >>>> cache_bypass_misses 3500 >>>> cache_hit_ratio 0 >>>> cache_hits 410 >>>> cache_miss_collisions 48879 >>>> cache_misses 80545 >>>> cache_readaheads 0 >>>> >>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d >>>> >>>> average_key_size 0 >>>> block_size 2.0k >>>> btree_cache_size 3.2M >>>> bucket_size 1.0M >>>> cache_available_percent 100 >>>> clear_stats congested 0 >>>> congested_threshold_us 0 >>>> dirty_data 0 >>>> io_error_halflife 0 >>>> io_error_limit 8 >>>> root_usage_percent 0 >>>> synchronous 1 >>>> tree_depth 1 >>>> >>>> >>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet >>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random >>>>>> writes, the 'bypassed' number increases in stats. Now I'm random >>>>>> writing direct to /dev/bcache0 and get the same result. >>>>> >>>>> Weird. From what you're describing it sounds like throttling is screwed >>>>> up (and it was recently), but I can't reproduce it now. >>>>> >>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir, >>>>> and seeing if that fixes it? >>>>> >>>>>> There also seems to be some work needed with clean-up, since I'm >>>>>> unfamiliar with how bcache works I attempted to make-bcache twice, >>>>>> thinking I'd start over. That worked, but because my cache device was >>>>>> already registered I was unable to re-register my newly formatted >>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >>>>>> don't try to register things with the same name in the same >>>>>> directory." I was still able to use my cache device via the old uuid, >>>>>> but this will probably cause problems on reboot. Perhaps an unregister >>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >>>>>> perhaps check for an existing superblock, ask for confirmation, and >>>>>> give some sort instruction on how to unregister, or do it for you if >>>>>> you reformat. >>>>> >>>>> Yeah, I think for some reason bcache isn't opening the devices >>>>> exclusively on 3.1. I'll have a look... >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in >>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-16 18:52 ` Kent Overstreet [not found] ` <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Kent Overstreet @ 2011-12-16 18:52 UTC (permalink / raw) To: Marcus Sorensen; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA That's what you'd expect in writethrough mode when you aren't getting any cache hits - try flipping on writeback and see what happens. On Fri, Dec 16, 2011 at 10:49 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Actually I think this IS user error. I ran a benchmark with FIO, and > the results were practically identical with and without bcache. I > applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it > applied cleanly I'm guessing that wiped something out. Here are my > stats after running the benchmark on bcache, and also included is the > fio config. > > bypassed 32.1G > cache_bypass_hits 5482 > cache_bypass_misses 194862 > cache_hit_ratio 3 > cache_hits 786 > cache_miss_collisions 206 > cache_misses 19447 > cache_readaheads 0 > > [global] > ioengine=libaio > iodepth=4 > invalidate=1 #make sure we're not cached locally > direct=1 #don't use buffers during test (test without local caches) > thread > ramp_time=20 > time_based > runtime=180 > > [8RandomReadWriters] > rw=randrw > numjobs=8 > blocksize=4k > size=1G > > [2SequentialReadWriters] > rw=rw > numjobs=2 > size=4G > blocksize_range=64k-1M > > > On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Thanks! I'll put it through some more tests. I kind of figured that >> something more real-world would help. >> >> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote: >>> Sorry, I was thinking about that issue for awhile and then I got distracted... >>> >>> It's not user error, it's an irritating corner case. Basically, it's >>> the result of a workaround for a particularly obscure data corruption >>> bug. >>> >>> If a write bypasses the cache, it has to invalidate that region of the >>> cache; the null key it leaves in the cache will block cache misses >>> from adding that data to the cache until the btree node fills up (and >>> possibly splits). >>> >>> It hasn't been an issue for us in normal operation, but when you're >>> just testing - i.e. you don't have much load - that node split may not >>> happen for a long time, and so if for some reason a bunch of data >>> bypassed the cache... well, you see what happens. >>> >>> Unfortunately a better solution to the original race is not going to >>> be simple, so it's probably not going to be done in the very near >>> future. It's a _very_ difficult race to hit, but in the meantime I'd >>> rather lose performance than corrupt data. >>> >>> But the good news is if you put normal server-ish load on it the issue >>> should go away in steady state operation. >>> >>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-) >>>> >>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>>> That keeps the 'bypassed' value from increasing, but it doesn't change >>>>> write performance. >>>>> >>>>> BEFORE: >>>>> [root@sansrv2-10 stats_day]# cat * >>>>> 27.6M >>>>> 83 >>>>> 3500 >>>>> 0 >>>>> 166 >>>>> 24380 >>>>> 40660 >>>>> 0 >>>>> >>>>> ...benchmarking... >>>>> >>>>> AFTER: >>>>> >>>>> [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; >>>>>> done 2>/dev/null >>>>> bypassed 27.6M >>>>> cache_bypass_hits 83 >>>>> cache_bypass_misses 3500 >>>>> cache_hit_ratio 0 >>>>> cache_hits 410 >>>>> cache_miss_collisions 48879 >>>>> cache_misses 80545 >>>>> cache_readaheads 0 >>>>> >>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d >>>>> >>>>> average_key_size 0 >>>>> block_size 2.0k >>>>> btree_cache_size 3.2M >>>>> bucket_size 1.0M >>>>> cache_available_percent 100 >>>>> clear_stats congested 0 >>>>> congested_threshold_us 0 >>>>> dirty_data 0 >>>>> io_error_halflife 0 >>>>> io_error_limit 8 >>>>> root_usage_percent 0 >>>>> synchronous 1 >>>>> tree_depth 1 >>>>> >>>>> >>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet >>>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random >>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random >>>>>>> writing direct to /dev/bcache0 and get the same result. >>>>>> >>>>>> Weird. From what you're describing it sounds like throttling is screwed >>>>>> up (and it was recently), but I can't reproduce it now. >>>>>> >>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir, >>>>>> and seeing if that fixes it? >>>>>> >>>>>>> There also seems to be some work needed with clean-up, since I'm >>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice, >>>>>>> thinking I'd start over. That worked, but because my cache device was >>>>>>> already registered I was unable to re-register my newly formatted >>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >>>>>>> don't try to register things with the same name in the same >>>>>>> directory." I was still able to use my cache device via the old uuid, >>>>>>> but this will probably cause problems on reboot. Perhaps an unregister >>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >>>>>>> perhaps check for an existing superblock, ask for confirmation, and >>>>>>> give some sort instruction on how to unregister, or do it for you if >>>>>>> you reformat. >>>>>> >>>>>> Yeah, I think for some reason bcache isn't opening the devices >>>>>> exclusively on 3.1. I'll have a look... >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in >>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-16 22:45 ` Marcus Sorensen [not found] ` <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Marcus Sorensen @ 2011-12-16 22:45 UTC (permalink / raw) To: Kent Overstreet; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA Yeah, I echoed 1 into writeback before doing the test. And why wouldn't I get any cache hits? On Fri, Dec 16, 2011 at 11:52 AM, Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: > That's what you'd expect in writethrough mode when you aren't getting > any cache hits - try flipping on writeback and see what happens. > > On Fri, Dec 16, 2011 at 10:49 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote: >> Actually I think this IS user error. I ran a benchmark with FIO, and >> the results were practically identical with and without bcache. I >> applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it >> applied cleanly I'm guessing that wiped something out. Here are my >> stats after running the benchmark on bcache, and also included is the >> fio config. >> >> bypassed 32.1G >> cache_bypass_hits 5482 >> cache_bypass_misses 194862 >> cache_hit_ratio 3 >> cache_hits 786 >> cache_miss_collisions 206 >> cache_misses 19447 >> cache_readaheads 0 >> >> [global] >> ioengine=libaio >> iodepth=4 >> invalidate=1 #make sure we're not cached locally >> direct=1 #don't use buffers during test (test without local caches) >> thread >> ramp_time=20 >> time_based >> runtime=180 >> >> [8RandomReadWriters] >> rw=randrw >> numjobs=8 >> blocksize=4k >> size=1G >> >> [2SequentialReadWriters] >> rw=rw >> numjobs=2 >> size=4G >> blocksize_range=64k-1M >> >> >> On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote: >>> Thanks! I'll put it through some more tests. I kind of figured that >>> something more real-world would help. >>> >>> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote: >>>> Sorry, I was thinking about that issue for awhile and then I got distracted... >>>> >>>> It's not user error, it's an irritating corner case. Basically, it's >>>> the result of a workaround for a particularly obscure data corruption >>>> bug. >>>> >>>> If a write bypasses the cache, it has to invalidate that region of the >>>> cache; the null key it leaves in the cache will block cache misses >>>> from adding that data to the cache until the btree node fills up (and >>>> possibly splits). >>>> >>>> It hasn't been an issue for us in normal operation, but when you're >>>> just testing - i.e. you don't have much load - that node split may not >>>> happen for a long time, and so if for some reason a bunch of data >>>> bypassed the cache... well, you see what happens. >>>> >>>> Unfortunately a better solution to the original race is not going to >>>> be simple, so it's probably not going to be done in the very near >>>> future. It's a _very_ difficult race to hit, but in the meantime I'd >>>> rather lose performance than corrupt data. >>>> >>>> But the good news is if you put normal server-ish load on it the issue >>>> should go away in steady state operation. >>>> >>>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-) >>>>> >>>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>>>> That keeps the 'bypassed' value from increasing, but it doesn't change >>>>>> write performance. >>>>>> >>>>>> BEFORE: >>>>>> [root@sansrv2-10 stats_day]# cat * >>>>>> 27.6M >>>>>> 83 >>>>>> 3500 >>>>>> 0 >>>>>> 166 >>>>>> 24380 >>>>>> 40660 >>>>>> 0 >>>>>> >>>>>> ...benchmarking... >>>>>> >>>>>> AFTER: >>>>>> >>>>>> [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; >>>>>>> done 2>/dev/null >>>>>> bypassed 27.6M >>>>>> cache_bypass_hits 83 >>>>>> cache_bypass_misses 3500 >>>>>> cache_hit_ratio 0 >>>>>> cache_hits 410 >>>>>> cache_miss_collisions 48879 >>>>>> cache_misses 80545 >>>>>> cache_readaheads 0 >>>>>> >>>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d >>>>>> >>>>>> average_key_size 0 >>>>>> block_size 2.0k >>>>>> btree_cache_size 3.2M >>>>>> bucket_size 1.0M >>>>>> cache_available_percent 100 >>>>>> clear_stats congested 0 >>>>>> congested_threshold_us 0 >>>>>> dirty_data 0 >>>>>> io_error_halflife 0 >>>>>> io_error_limit 8 >>>>>> root_usage_percent 0 >>>>>> synchronous 1 >>>>>> tree_depth 1 >>>>>> >>>>>> >>>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet >>>>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >>>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random >>>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random >>>>>>>> writing direct to /dev/bcache0 and get the same result. >>>>>>> >>>>>>> Weird. From what you're describing it sounds like throttling is screwed >>>>>>> up (and it was recently), but I can't reproduce it now. >>>>>>> >>>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir, >>>>>>> and seeing if that fixes it? >>>>>>> >>>>>>>> There also seems to be some work needed with clean-up, since I'm >>>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice, >>>>>>>> thinking I'd start over. That worked, but because my cache device was >>>>>>>> already registered I was unable to re-register my newly formatted >>>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >>>>>>>> don't try to register things with the same name in the same >>>>>>>> directory." I was still able to use my cache device via the old uuid, >>>>>>>> but this will probably cause problems on reboot. Perhaps an unregister >>>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >>>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >>>>>>>> perhaps check for an existing superblock, ask for confirmation, and >>>>>>>> give some sort instruction on how to unregister, or do it for you if >>>>>>>> you reformat. >>>>>>> >>>>>>> Yeah, I think for some reason bcache isn't opening the devices >>>>>>> exclusively on 3.1. I'll have a look... >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in >>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Quick bcache benchmark [not found] ` <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-16 23:33 ` Kent Overstreet 0 siblings, 0 replies; 17+ messages in thread From: Kent Overstreet @ 2011-12-16 23:33 UTC (permalink / raw) To: Marcus Sorensen; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA Sounds like the cache isn't getting populated for some reason. Have you tried disabling throttling? echo 0 > congested_threshhold_us With that off and writeback on you really ought to get some performance improvement on random writes... On Fri, Dec 16, 2011 at 2:45 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Yeah, I echoed 1 into writeback before doing the test. And why > wouldn't I get any cache hits? > > On Fri, Dec 16, 2011 at 11:52 AM, Kent Overstreet > <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: >> That's what you'd expect in writethrough mode when you aren't getting >> any cache hits - try flipping on writeback and see what happens. >> >> On Fri, Dec 16, 2011 at 10:49 AM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>> Actually I think this IS user error. I ran a benchmark with FIO, and >>> the results were practically identical with and without bcache. I >>> applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it >>> applied cleanly I'm guessing that wiped something out. Here are my >>> stats after running the benchmark on bcache, and also included is the >>> fio config. >>> >>> bypassed 32.1G >>> cache_bypass_hits 5482 >>> cache_bypass_misses 194862 >>> cache_hit_ratio 3 >>> cache_hits 786 >>> cache_miss_collisions 206 >>> cache_misses 19447 >>> cache_readaheads 0 >>> >>> [global] >>> ioengine=libaio >>> iodepth=4 >>> invalidate=1 #make sure we're not cached locally >>> direct=1 #don't use buffers during test (test without local caches) >>> thread >>> ramp_time=20 >>> time_based >>> runtime=180 >>> >>> [8RandomReadWriters] >>> rw=randrw >>> numjobs=8 >>> blocksize=4k >>> size=1G >>> >>> [2SequentialReadWriters] >>> rw=rw >>> numjobs=2 >>> size=4G >>> blocksize_range=64k-1M >>> >>> >>> On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>> Thanks! I'll put it through some more tests. I kind of figured that >>>> something more real-world would help. >>>> >>>> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote: >>>>> Sorry, I was thinking about that issue for awhile and then I got distracted... >>>>> >>>>> It's not user error, it's an irritating corner case. Basically, it's >>>>> the result of a workaround for a particularly obscure data corruption >>>>> bug. >>>>> >>>>> If a write bypasses the cache, it has to invalidate that region of the >>>>> cache; the null key it leaves in the cache will block cache misses >>>>> from adding that data to the cache until the btree node fills up (and >>>>> possibly splits). >>>>> >>>>> It hasn't been an issue for us in normal operation, but when you're >>>>> just testing - i.e. you don't have much load - that node split may not >>>>> happen for a long time, and so if for some reason a bunch of data >>>>> bypassed the cache... well, you see what happens. >>>>> >>>>> Unfortunately a better solution to the original race is not going to >>>>> be simple, so it's probably not going to be done in the very near >>>>> future. It's a _very_ difficult race to hit, but in the meantime I'd >>>>> rather lose performance than corrupt data. >>>>> >>>>> But the good news is if you put normal server-ish load on it the issue >>>>> should go away in steady state operation. >>>>> >>>>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-) >>>>>> >>>>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote: >>>>>>> That keeps the 'bypassed' value from increasing, but it doesn't change >>>>>>> write performance. >>>>>>> >>>>>>> BEFORE: >>>>>>> [root@sansrv2-10 stats_day]# cat * >>>>>>> 27.6M >>>>>>> 83 >>>>>>> 3500 >>>>>>> 0 >>>>>>> 166 >>>>>>> 24380 >>>>>>> 40660 >>>>>>> 0 >>>>>>> >>>>>>> ...benchmarking... >>>>>>> >>>>>>> AFTER: >>>>>>> >>>>>>> [root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i; >>>>>>>> done 2>/dev/null >>>>>>> bypassed 27.6M >>>>>>> cache_bypass_hits 83 >>>>>>> cache_bypass_misses 3500 >>>>>>> cache_hit_ratio 0 >>>>>>> cache_hits 410 >>>>>>> cache_miss_collisions 48879 >>>>>>> cache_misses 80545 >>>>>>> cache_readaheads 0 >>>>>>> >>>>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d >>>>>>> >>>>>>> average_key_size 0 >>>>>>> block_size 2.0k >>>>>>> btree_cache_size 3.2M >>>>>>> bucket_size 1.0M >>>>>>> cache_available_percent 100 >>>>>>> clear_stats congested 0 >>>>>>> congested_threshold_us 0 >>>>>>> dirty_data 0 >>>>>>> io_error_halflife 0 >>>>>>> io_error_limit 8 >>>>>>> root_usage_percent 0 >>>>>>> synchronous 1 >>>>>>> tree_depth 1 >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet >>>>>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote: >>>>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random >>>>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random >>>>>>>>> writing direct to /dev/bcache0 and get the same result. >>>>>>>> >>>>>>>> Weird. From what you're describing it sounds like throttling is screwed >>>>>>>> up (and it was recently), but I can't reproduce it now. >>>>>>>> >>>>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir, >>>>>>>> and seeing if that fixes it? >>>>>>>> >>>>>>>>> There also seems to be some work needed with clean-up, since I'm >>>>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice, >>>>>>>>> thinking I'd start over. That worked, but because my cache device was >>>>>>>>> already registered I was unable to re-register my newly formatted >>>>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST, >>>>>>>>> don't try to register things with the same name in the same >>>>>>>>> directory." I was still able to use my cache device via the old uuid, >>>>>>>>> but this will probably cause problems on reboot. Perhaps an unregister >>>>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to >>>>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should >>>>>>>>> perhaps check for an existing superblock, ask for confirmation, and >>>>>>>>> give some sort instruction on how to unregister, or do it for you if >>>>>>>>> you reformat. >>>>>>>> >>>>>>>> Yeah, I think for some reason bcache isn't opening the devices >>>>>>>> exclusively on 3.1. I'll have a look... >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in >>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-12-16 23:33 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-06 8:22 Quick bcache benchmark Kent Overstreet
2011-12-06 8:22 ` Kent Overstreet
[not found] ` <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw@mail.gmail.com>
[not found] ` <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06 11:56 ` Kent Overstreet
2011-12-06 14:10 ` Bostjan Skufca
[not found] ` <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06 17:02 ` Marcus Sorensen
[not found] ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw@mail.gmail.com>
[not found] ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09 10:02 ` Kent Overstreet
[not found] ` <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09 17:09 ` Marcus Sorensen
[not found] ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09 17:14 ` Marcus Sorensen
2011-12-10 6:33 ` Kent Overstreet
2011-12-10 15:02 ` Marcus Sorensen
[not found] ` <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-15 23:40 ` Marcus Sorensen
[not found] ` <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 2:17 ` Kent Overstreet
[not found] ` <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 4:28 ` Marcus Sorensen
[not found] ` <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 18:49 ` Marcus Sorensen
[not found] ` <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 18:52 ` Kent Overstreet
[not found] ` <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 22:45 ` Marcus Sorensen
[not found] ` <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 23:33 ` Kent Overstreet
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.