Quick bcache benchmark

All of lore.kernel.org
 help / color / mirror / Atom feed

* Quick bcache benchmark
@ 2011-12-06  8:22 ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06  8:22 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

I've been very remiss in posting benchmarks; this isn't much, but if
anyone has suggestions for what they want I'll see if I can run it.

This is on an old corsair nova - bcache can go something like 10x faster
but this is what I have at home. The profile is still interesting,
though.

The benchmark is 4k random O_DIRECT reads on a 16 gb file, all in cache
- the idea is to push the b+tree.

Also, the backing device is a md raid10 - so that's working, provided
you format your cache with buckets not greater than 1 mb.

root@utumno:/mnt# perf record -afg fio ~/rw4k
randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio 1.59
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [69914K/0K /s] [17.7K/0  iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=1247
  read : io=16384MB, bw=68713KB/s, iops=17178 , runt=244169msec
  cpu          : usr=5.66%, sys=22.93%, ctx=4198688, majf=0, minf=85
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued r/w/d: total=4194367/0/0, short=0/0/0



Run status group 0 (all jobs):
   READ: io=16384MB, aggrb=68712KB/s, minb=70361KB/s, maxb=70361KB/s, mint=244169msec, maxt=244169msec

Disk stats (read/write):
  bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

7.74%             fio  fio                 [.] 0x1d2ed
5.26%         swapper  [kernel.kallsyms]   [k] ahci_interrupt
3.02%         swapper  [kernel.kallsyms]   [k] mwait_idle
2.52%         swapper  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
1.82%         swapper  [kernel.kallsyms]   [k] ahci_scr_read
1.68%             fio  [kernel.kallsyms]   [k] __bset_search		<- first bcache function
1.37%             fio  [kernel.kallsyms]   [k] __blockdev_direct_IO
1.36%         swapper  [kernel.kallsyms]   [k] irq_entries_start
1.25%         swapper  [kernel.kallsyms]   [k] mix_pool_bytes_extract
1.06%         swapper  [kernel.kallsyms]   [k] kmem_cache_free
0.94%             fio  [kernel.kallsyms]   [k] __switch_to
0.92%             fio  [kernel.kallsyms]   [k] system_call
0.87%         swapper  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
0.84%         swapper  [kernel.kallsyms]   [k] ata_qc_new_init
0.80%             fio  [kernel.kallsyms]   [k] __schedule
0.76%             fio  [kernel.kallsyms]   [k] do_io_submit
0.74%             fio  [kernel.kallsyms]   [k] _raw_spin_lock_irq
0.74%         swapper  [kernel.kallsyms]   [k] _raw_spin_lock
0.73%             fio  [kernel.kallsyms]   [k] ext4_ext_find_extent
0.71%             fio  [kernel.kallsyms]   [k] kmem_cache_alloc
0.70%             fio  [kernel.kallsyms]   [k] read_events
0.65%             fio  libaio.so.1.0.1     [.] 0x665
0.64%         swapper  [kernel.kallsyms]   [k] __schedule
0.63%         swapper  [kernel.kallsyms]   [k] native_sched_clock
0.63%             fio  [kernel.kallsyms]   [k] btree_search_leaf	<- second bcache function
0.62%             fio  [kernel.kallsyms]   [k] bcache_make_request
0.59%         swapper  [kernel.kallsyms]   [k] read_tsc
0.58%             fio  [kernel.kallsyms]   [k] aio_read_evt
0.58%         swapper  [kernel.kallsyms]   [k] select_task_rq_fair
0.57%         swapper  [kernel.kallsyms]   [k] tick_nohz_stop_sched_tick
0.57%             fio  [kernel.kallsyms]   [k] __request_read
0.57%             fio  [kernel.kallsyms]   [k] generic_make_request
0.56%         swapper  [kernel.kallsyms]   [k] sd_prep_fn
0.53%             fio  [kernel.kallsyms]   [k] _raw_spin_lock
0.51%         swapper  [kernel.kallsyms]   [k] __hrtimer_start_range_ns
0.50%             fio  [kernel.kallsyms]   [k] __math_state_restore

Some of the calls to kmem_cache_(free|alloc) are of course from bcache
but it looks to be under 25%.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Quick bcache benchmark
@ 2011-12-06  8:22 ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06  8:22 UTC (permalink / raw)
  To: linux-bcache, linux-kernel

I've been very remiss in posting benchmarks; this isn't much, but if
anyone has suggestions for what they want I'll see if I can run it.

This is on an old corsair nova - bcache can go something like 10x faster
but this is what I have at home. The profile is still interesting,
though.

The benchmark is 4k random O_DIRECT reads on a 16 gb file, all in cache
- the idea is to push the b+tree.

Also, the backing device is a md raid10 - so that's working, provided
you format your cache with buckets not greater than 1 mb.

root@utumno:/mnt# perf record -afg fio ~/rw4k
randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio 1.59
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [69914K/0K /s] [17.7K/0  iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=1247
  read : io=16384MB, bw=68713KB/s, iops=17178 , runt=244169msec
  cpu          : usr=5.66%, sys=22.93%, ctx=4198688, majf=0, minf=85
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued r/w/d: total=4194367/0/0, short=0/0/0



Run status group 0 (all jobs):
   READ: io=16384MB, aggrb=68712KB/s, minb=70361KB/s, maxb=70361KB/s, mint=244169msec, maxt=244169msec

Disk stats (read/write):
  bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

7.74%             fio  fio                 [.] 0x1d2ed
5.26%         swapper  [kernel.kallsyms]   [k] ahci_interrupt
3.02%         swapper  [kernel.kallsyms]   [k] mwait_idle
2.52%         swapper  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
1.82%         swapper  [kernel.kallsyms]   [k] ahci_scr_read
1.68%             fio  [kernel.kallsyms]   [k] __bset_search		<- first bcache function
1.37%             fio  [kernel.kallsyms]   [k] __blockdev_direct_IO
1.36%         swapper  [kernel.kallsyms]   [k] irq_entries_start
1.25%         swapper  [kernel.kallsyms]   [k] mix_pool_bytes_extract
1.06%         swapper  [kernel.kallsyms]   [k] kmem_cache_free
0.94%             fio  [kernel.kallsyms]   [k] __switch_to
0.92%             fio  [kernel.kallsyms]   [k] system_call
0.87%         swapper  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
0.84%         swapper  [kernel.kallsyms]   [k] ata_qc_new_init
0.80%             fio  [kernel.kallsyms]   [k] __schedule
0.76%             fio  [kernel.kallsyms]   [k] do_io_submit
0.74%             fio  [kernel.kallsyms]   [k] _raw_spin_lock_irq
0.74%         swapper  [kernel.kallsyms]   [k] _raw_spin_lock
0.73%             fio  [kernel.kallsyms]   [k] ext4_ext_find_extent
0.71%             fio  [kernel.kallsyms]   [k] kmem_cache_alloc
0.70%             fio  [kernel.kallsyms]   [k] read_events
0.65%             fio  libaio.so.1.0.1     [.] 0x665
0.64%         swapper  [kernel.kallsyms]   [k] __schedule
0.63%         swapper  [kernel.kallsyms]   [k] native_sched_clock
0.63%             fio  [kernel.kallsyms]   [k] btree_search_leaf	<- second bcache function
0.62%             fio  [kernel.kallsyms]   [k] bcache_make_request
0.59%         swapper  [kernel.kallsyms]   [k] read_tsc
0.58%             fio  [kernel.kallsyms]   [k] aio_read_evt
0.58%         swapper  [kernel.kallsyms]   [k] select_task_rq_fair
0.57%         swapper  [kernel.kallsyms]   [k] tick_nohz_stop_sched_tick
0.57%             fio  [kernel.kallsyms]   [k] __request_read
0.57%             fio  [kernel.kallsyms]   [k] generic_make_request
0.56%         swapper  [kernel.kallsyms]   [k] sd_prep_fn
0.53%             fio  [kernel.kallsyms]   [k] _raw_spin_lock
0.51%         swapper  [kernel.kallsyms]   [k] __hrtimer_start_range_ns
0.50%             fio  [kernel.kallsyms]   [k] __math_state_restore

Some of the calls to kmem_cache_(free|alloc) are of course from bcache
but it looks to be under 25%.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw@mail.gmail.com>]

[parent not found: <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]   ` <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-06 11:56     ` Kent Overstreet
  2011-12-06 14:10       ` Bostjan Skufca
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06 11:56 UTC (permalink / raw)
  To: Bostjan Skufca; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Dec 06, 2011 at 11:39:57AM +0100, Bostjan Skufca wrote:
> Random write test?

Sure.

That corsair was giving me /terrible/ write performance, pulled the
intel SSD out of my other machine (unregistered the cache from the
backing device and attached the new SSD all without unmounting the
filesystem :)

Write performance with the intel is not /awesome/, but much more
reasonable:

root@utumno:/mnt# perf record -afg fio ~/rw4k 
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio 1.59
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/98365K /s] [0 /24.2K iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=1560
  write: io=16384MB, bw=89547KB/s, iops=22386 , runt=187359msec
  cpu          : usr=3.94%, sys=14.82%, ctx=300435, majf=0, minf=19
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued r/w/d: total=0/4194367/0, short=0/0/0



Run status group 0 (all jobs):
  WRITE: io=16384MB, aggrb=89547KB/s, minb=91696KB/s, maxb=91696KB/s, mint=187359msec, maxt=187359msec

Disk stats (read/write):
  bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

8.97%             fio  fio                 [.] 0xd1b2
1.64%             fio  [kernel.kallsyms]   [k] bio_insert		<- first bcache function
1.56%             fio  [kernel.kallsyms]   [k] __blockdev_direct_IO
1.24%     kworker/1:2  [kernel.kallsyms]   [k] __bset_search
1.24%     kworker/0:0  [kernel.kallsyms]   [k] __bset_search
1.19%         swapper  [kernel.kallsyms]   [k] ahci_interrupt
1.17%     kworker/0:1  [kernel.kallsyms]   [k] __bset_search
1.17%     kworker/1:0  [kernel.kallsyms]   [k] __bset_search
1.09%             fio  [kernel.kallsyms]   [k] system_call
1.06%     kworker/0:2  [kernel.kallsyms]   [k] __bset_search
1.06%     kworker/1:1  [kernel.kallsyms]   [k] __bset_search
1.04%             fio  [kernel.kallsyms]   [k] ext4_ext_find_extent
0.96%             fio  [kernel.kallsyms]   [k] _raw_spin_lock_irq
0.92%             fio  [kernel.kallsyms]   [k] bcache_make_request
0.87%         swapper  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
0.85%         swapper  [kernel.kallsyms]   [k] mwait_idle
0.83%             fio  [kernel.kallsyms]   [k] do_io_submit
0.77%             fio  [kernel.kallsyms]   [k] memset
0.70%             fio  [kernel.kallsyms]   [k] kmem_cache_alloc
0.65%             fio  [kernel.kallsyms]   [k] md5_transform
0.63%             fio  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
0.61%             fio  [kernel.kallsyms]   [k] _raw_spin_lock
0.58%             fio  [kernel.kallsyms]   [k] generic_make_request
0.57%             fio  libaio.so.1.0.1     [.] 0x6b7
0.50%             fio  [kernel.kallsyms]   [k] gup_pte_range

haven't seen bio_insert() show up that high in a profile before, wonder
what's up with that..

Reran the random read benchmark with the intel:

root@utumno:/mnt# perf record -afg fio ~/rw4k 
randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio 1.59
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [190.9M/0K /s] [47.7K/0  iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=1575
  read : io=16384MB, bw=153120KB/s, iops=38279 , runt=109571msec
  cpu          : usr=7.22%, sys=52.15%, ctx=678086, majf=0, minf=85
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued r/w/d: total=4194367/0/0, short=0/0/0



Run status group 0 (all jobs):
   READ: io=16384MB, aggrb=153119KB/s, minb=156794KB/s, maxb=156794KB/s, mint=109571msec, maxt=109571msec

Disk stats (read/write):
  bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Basically, whatever hardware you have bcache will easily max it out.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Quick bcache benchmark
  2011-12-06 11:56     ` Kent Overstreet
@ 2011-12-06 14:10       ` Bostjan Skufca
       [not found]         ` <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Bostjan Skufca @ 2011-12-06 14:10 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Nice, 22k iops for random write is not bad at all (especially compared
to spinning disks:)
I have a couple of questions, can you please confirm that I am
understanding bcache correctly:
1. When you issue that many random write requests, they get written to
SSD first. Then they are slowly propagated from SSD to spinning disk,
right? In original order or is the order optimized?2. What about when
I unregister bcache from a device? Does it flush changes from SSD to
platter?3. Same question (2) for unmounting a drive?4. If machine
crashes, will bcache replay changes from SSD to platter at mount
time?5. Does it export a number of writes that are pending on SSD, via
some /proc or /sys interface?6. Is read cache hot or cold at boot
time?
I know that is an overkill for wording "couple of questions", sorry:)

b.


On 6 December 2011 12:56, Kent Overstreet <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Tue, Dec 06, 2011 at 11:39:57AM +0100, Bostjan Skufca wrote:
> > Random write test?
>
> Sure.
>
> That corsair was giving me /terrible/ write performance, pulled the
> intel SSD out of my other machine (unregistered the cache from the
> backing device and attached the new SSD all without unmounting the
> filesystem :)
>
> Write performance with the intel is not /awesome/, but much more
> reasonable:
>
> root@utumno:/mnt# perf record -afg fio ~/rw4k
> randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
> fio 1.59
> Starting 1 process
> Jobs: 1 (f=1): [w] [100.0% done] [0K/98365K /s] [0 /24.2K iops] [eta 00m:00s]
> randwrite: (groupid=0, jobs=1): err= 0: pid=1560
>  write: io=16384MB, bw=89547KB/s, iops=22386 , runt=187359msec
>  cpu          : usr=3.94%, sys=14.82%, ctx=300435, majf=0, minf=19
>  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
>     issued r/w/d: total=0/4194367/0, short=0/0/0
>
>
>
> Run status group 0 (all jobs):
>  WRITE: io=16384MB, aggrb=89547KB/s, minb=91696KB/s, maxb=91696KB/s, mint=187359msec, maxt=187359msec
>
> Disk stats (read/write):
>  bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> 8.97%             fio  fio                 [.] 0xd1b2
> 1.64%             fio  [kernel.kallsyms]   [k] bio_insert               <- first bcache function
> 1.56%             fio  [kernel.kallsyms]   [k] __blockdev_direct_IO
> 1.24%     kworker/1:2  [kernel.kallsyms]   [k] __bset_search
> 1.24%     kworker/0:0  [kernel.kallsyms]   [k] __bset_search
> 1.19%         swapper  [kernel.kallsyms]   [k] ahci_interrupt
> 1.17%     kworker/0:1  [kernel.kallsyms]   [k] __bset_search
> 1.17%     kworker/1:0  [kernel.kallsyms]   [k] __bset_search
> 1.09%             fio  [kernel.kallsyms]   [k] system_call
> 1.06%     kworker/0:2  [kernel.kallsyms]   [k] __bset_search
> 1.06%     kworker/1:1  [kernel.kallsyms]   [k] __bset_search
> 1.04%             fio  [kernel.kallsyms]   [k] ext4_ext_find_extent
> 0.96%             fio  [kernel.kallsyms]   [k] _raw_spin_lock_irq
> 0.92%             fio  [kernel.kallsyms]   [k] bcache_make_request
> 0.87%         swapper  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
> 0.85%         swapper  [kernel.kallsyms]   [k] mwait_idle
> 0.83%             fio  [kernel.kallsyms]   [k] do_io_submit
> 0.77%             fio  [kernel.kallsyms]   [k] memset
> 0.70%             fio  [kernel.kallsyms]   [k] kmem_cache_alloc
> 0.65%             fio  [kernel.kallsyms]   [k] md5_transform
> 0.63%             fio  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
> 0.61%             fio  [kernel.kallsyms]   [k] _raw_spin_lock
> 0.58%             fio  [kernel.kallsyms]   [k] generic_make_request
> 0.57%             fio  libaio.so.1.0.1     [.] 0x6b7
> 0.50%             fio  [kernel.kallsyms]   [k] gup_pte_range
>
> haven't seen bio_insert() show up that high in a profile before, wonder
> what's up with that..
>
> Reran the random read benchmark with the intel:
>
> root@utumno:/mnt# perf record -afg fio ~/rw4k
> randwrite: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
> fio 1.59
> Starting 1 process
> Jobs: 1 (f=1): [r] [100.0% done] [190.9M/0K /s] [47.7K/0  iops] [eta 00m:00s]
> randwrite: (groupid=0, jobs=1): err= 0: pid=1575
>  read : io=16384MB, bw=153120KB/s, iops=38279 , runt=109571msec
>  cpu          : usr=7.22%, sys=52.15%, ctx=678086, majf=0, minf=85
>  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
>     issued r/w/d: total=4194367/0/0, short=0/0/0
>
>
>
> Run status group 0 (all jobs):
>   READ: io=16384MB, aggrb=153119KB/s, minb=156794KB/s, maxb=156794KB/s, mint=109571msec, maxt=109571msec
>
> Disk stats (read/write):
>  bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> Basically, whatever hardware you have bcache will easily max it out.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]         ` <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-06 17:02           ` Marcus Sorensen
       [not found]             ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-06 17:02 UTC (permalink / raw)
  To: Bostjan Skufca; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

I'm also curious as to how it decides what to keep in cache and whatto
toss out, what to write direct to platter and what to buffer. I'vebeen
testing LSI's cachecade 2.0 pro, and my intent is to post
somebenchmarks between the two. From what I've seen you get at most
1/2performance of your SSD if everything could fit into cache, I'm
notsure if that's due to their algorithm and how they decide what's
SSDworthy and what's not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw@mail.gmail.com>]

[parent not found: <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]               ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-09 10:02                 ` Kent Overstreet
       [not found]                   ` <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-09 10:02 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: Bostjan Skufca, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Weird. That wouldn't be blocksize - a tiny bucket size could cause
performance issues, but not consistent with what you describe.

Might be some sort of interaction with xfs, I'll have to see if I can
reproduce it.

On Thu, Dec 8, 2011 at 6:32 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Got to try this out quickly this afternoon. Used 200GB hardware raid1
> caching for 8 disk, 8T raid 10. Enabled writeback, put xfs on bcache0.
> Mkfs.xfs took awhile, which was unusual. I mounted the filesystem, created
> an 8GB file, which was fast. Then ran some 512b random reads against it(16
> threads), almost sad speed. Switched same test to random writes, and it was
> as slow as spindle. Some of the threads even threw "blocked for 120 seconds"
> traces. I wonder if my blocksize is set wrong on the cache, sort of hard to
> find the appropriate numbers.
>
> On Dec 6, 2011 10:02 AM, "Marcus Sorensen" <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>> I'm also curious as to how it decides what to keep in cache and whatto
>> toss out, what to write direct to platter and what to buffer. I'vebeen
>> testing LSI's cachecade 2.0 pro, and my intent is to post
>> somebenchmarks between the two. From what I've seen you get at most
>> 1/2performance of your SSD if everything could fit into cache, I'm
>> notsure if that's due to their algorithm and how they decide what's
>> SSDworthy and what's not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                   ` <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-09 17:09                     ` Marcus Sorensen
       [not found]                       ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-09 17:09 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Here's some more info. I'm running kernel 3.1.4. When I do random
writes, the 'bypassed' number increases in stats. Now I'm random
writing direct to /dev/bcache0 and get the same result.

The application I'm using to test does the following:

1. looks at size of test file
2. divides size of file by a cmd line specified io size (tried 512b
and 4k) and considers that the blockcount for file
3. randomly selects a block number between 0 and blockcount
4. writes a random string of characters of blocksize to specified block
5. repeat 3 and 4

I'll try a few other benchmark tools.

[root@sansrv2-10 bcache]# for i in `ls`; do echo -n "$i "; cat $i; done
label
readahead 0
running 1
sequential_cutoff 4.0M
sequential_merge 1
state dirty
verify 0
writeback 1
writeback_delay 30
writeback_metadata 1
writeback_percent 0
writeback_running 1

SSD benchmark:
[root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/sde

WRITE benchmarking against /dev/sde 218880 MB

total time: 5.39, time per WRITE request(ms): 0.067
14839.55 total seeks per sec, 927.47 WRITE seeks per sec per thread

bcache0 benchmark:

[root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/bcache0

WRITE benchmarking against /dev/bcache0 7628799 MB

total time: 510.75, time per WRITE request(ms): 6.384
156.63 total seeks per sec, 9.79 WRITE seeks per sec per thread

There also seems to be some work needed with clean-up, since I'm
unfamiliar with how bcache works I attempted to make-bcache twice,
thinking I'd start over. That worked, but because my cache device was
already registered I was unable to re-register my newly formatted
cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
don't try to register things with the same name in the same
directory." I was still able to use my cache device via the old uuid,
but this will probably cause problems on reboot. Perhaps an unregister
file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
see if I could clear /sys/fs/bcache, but no luck. make-bcache should
perhaps check for an existing superblock, ask for confirmation, and
give some sort instruction on how to unregister, or do it for you if
you reformat.

On Fri, Dec 9, 2011 at 3:02 AM, Kent Overstreet
<kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Weird. That wouldn't be blocksize - a tiny bucket size could cause
> performance issues, but not consistent with what you describe.
>
> Might be some sort of interaction with xfs, I'll have to see if I can
> reproduce it.
>
> On Thu, Dec 8, 2011 at 6:32 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Got to try this out quickly this afternoon. Used 200GB hardware raid1
>> caching for 8 disk, 8T raid 10. Enabled writeback, put xfs on bcache0.
>> Mkfs.xfs took awhile, which was unusual. I mounted the filesystem, created
>> an 8GB file, which was fast. Then ran some 512b random reads against it(16
>> threads), almost sad speed. Switched same test to random writes, and it was
>> as slow as spindle. Some of the threads even threw "blocked for 120 seconds"
>> traces. I wonder if my blocksize is set wrong on the cache, sort of hard to
>> find the appropriate numbers.
>>
>> On Dec 6, 2011 10:02 AM, "Marcus Sorensen" <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>
>>> I'm also curious as to how it decides what to keep in cache and whatto
>>> toss out, what to write direct to platter and what to buffer. I'vebeen
>>> testing LSI's cachecade 2.0 pro, and my intent is to post
>>> somebenchmarks between the two. From what I've seen you get at most
>>> 1/2performance of your SSD if everything could fit into cache, I'm
>>> notsure if that's due to their algorithm and how they decide what's
>>> SSDworthy and what's not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                       ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-09 17:14                         ` Marcus Sorensen
  2011-12-10  6:33                         ` Kent Overstreet
  1 sibling, 0 replies; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-09 17:14 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Oh, and here are my stats after running that write benchmark on
bcache0. That's pretty much the only thing I've done in these stats.

[root@sansrv2-10 stats_day]# for i in `ls`; do echo -n "$i "; cat $i;
done 2>/dev/null
bypassed 605M
cache_bypass_hits 333
cache_bypass_misses 77553
cache_hit_ratio 0
cache_hits 85
cache_miss_collisions 9256
cache_misses 10031
cache_readaheads 0

On Fri, Dec 9, 2011 at 10:09 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Here's some more info. I'm running kernel 3.1.4. When I do random
> writes, the 'bypassed' number increases in stats. Now I'm random
> writing direct to /dev/bcache0 and get the same result.
>
> The application I'm using to test does the following:
>
> 1. looks at size of test file
> 2. divides size of file by a cmd line specified io size (tried 512b
> and 4k) and considers that the blockcount for file
> 3. randomly selects a block number between 0 and blockcount
> 4. writes a random string of characters of blocksize to specified block
> 5. repeat 3 and 4
>
> I'll try a few other benchmark tools.
>
> [root@sansrv2-10 bcache]# for i in `ls`; do echo -n "$i "; cat $i; done
> label
> readahead 0
> running 1
> sequential_cutoff 4.0M
> sequential_merge 1
> state dirty
> verify 0
> writeback 1
> writeback_delay 30
> writeback_metadata 1
> writeback_percent 0
> writeback_running 1
>
> SSD benchmark:
> [root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/sde
>
> WRITE benchmarking against /dev/sde 218880 MB
>
>
> total time: 5.39, time per WRITE request(ms): 0.067
> 14839.55 total seeks per sec, 927.47 WRITE seeks per sec per thread
>
> bcache0 benchmark:
>
> [root@sansrv2-10 ~]# ./seekmark -t16 -q -w destroy-data -f /dev/bcache0
>
> WRITE benchmarking against /dev/bcache0 7628799 MB
>
>
> total time: 510.75, time per WRITE request(ms): 6.384
> 156.63 total seeks per sec, 9.79 WRITE seeks per sec per thread
>
>
> There also seems to be some work needed with clean-up, since I'm
> unfamiliar with how bcache works I attempted to make-bcache twice,
> thinking I'd start over. That worked, but because my cache device was
> already registered I was unable to re-register my newly formatted
> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
> don't try to register things with the same name in the same
> directory." I was still able to use my cache device via the old uuid,
> but this will probably cause problems on reboot. Perhaps an unregister
> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
> perhaps check for an existing superblock, ask for confirmation, and
> give some sort instruction on how to unregister, or do it for you if
> you reformat.
>
>
>
>
>
>
>
>
> On Fri, Dec 9, 2011 at 3:02 AM, Kent Overstreet
> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Weird. That wouldn't be blocksize - a tiny bucket size could cause
>> performance issues, but not consistent with what you describe.
>>
>> Might be some sort of interaction with xfs, I'll have to see if I can
>> reproduce it.
>>
>> On Thu, Dec 8, 2011 at 6:32 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Got to try this out quickly this afternoon. Used 200GB hardware raid1
>>> caching for 8 disk, 8T raid 10. Enabled writeback, put xfs on bcache0.
>>> Mkfs.xfs took awhile, which was unusual. I mounted the filesystem, created
>>> an 8GB file, which was fast. Then ran some 512b random reads against it(16
>>> threads), almost sad speed. Switched same test to random writes, and it was
>>> as slow as spindle. Some of the threads even threw "blocked for 120 seconds"
>>> traces. I wonder if my blocksize is set wrong on the cache, sort of hard to
>>> find the appropriate numbers.
>>>
>>> On Dec 6, 2011 10:02 AM, "Marcus Sorensen" <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>
>>>> I'm also curious as to how it decides what to keep in cache and whatto
>>>> toss out, what to write direct to platter and what to buffer. I'vebeen
>>>> testing LSI's cachecade 2.0 pro, and my intent is to post
>>>> somebenchmarks between the two. From what I've seen you get at most
>>>> 1/2performance of your SSD if everything could fit into cache, I'm
>>>> notsure if that's due to their algorithm and how they decide what's
>>>> SSDworthy and what's not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Quick bcache benchmark
       [not found]                       ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-12-09 17:14                         ` Marcus Sorensen
@ 2011-12-10  6:33                         ` Kent Overstreet
  2011-12-10 15:02                           ` Marcus Sorensen
  1 sibling, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-10  6:33 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
> Here's some more info. I'm running kernel 3.1.4. When I do random
> writes, the 'bypassed' number increases in stats. Now I'm random
> writing direct to /dev/bcache0 and get the same result.

Weird. From what you're describing it sounds like throttling is screwed
up (and it was recently), but I can't reproduce it now.

Can you try echoing 0 to congested_threshold_us in the cache set dir,
and seeing if that fixes it?

> There also seems to be some work needed with clean-up, since I'm
> unfamiliar with how bcache works I attempted to make-bcache twice,
> thinking I'd start over. That worked, but because my cache device was
> already registered I was unable to re-register my newly formatted
> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
> don't try to register things with the same name in the same
> directory." I was still able to use my cache device via the old uuid,
> but this will probably cause problems on reboot. Perhaps an unregister
> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
> perhaps check for an existing superblock, ask for confirmation, and
> give some sort instruction on how to unregister, or do it for you if
> you reformat.

Yeah, I think for some reason bcache isn't opening the devices
exclusively on 3.1. I'll have a look...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Quick bcache benchmark
  2011-12-10  6:33                         ` Kent Overstreet
@ 2011-12-10 15:02                           ` Marcus Sorensen
       [not found]                             ` <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-10 15:02 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

That keeps the 'bypassed' value from increasing, but it doesn't change
write performance.

BEFORE:
[root@sansrv2-10 stats_day]# cat *
27.6M
83
3500
0
166
24380
40660
0

...benchmarking...

AFTER:

[root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
> done 2>/dev/null
bypassed 27.6M
cache_bypass_hits 83
cache_bypass_misses 3500
cache_hit_ratio 0
cache_hits 410
cache_miss_collisions 48879
cache_misses 80545
cache_readaheads 0

/sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d

average_key_size 0
block_size 2.0k
btree_cache_size 3.2M
bucket_size 1.0M
cache_available_percent 100
clear_stats congested 0
congested_threshold_us 0
dirty_data 0
io_error_halflife 0
io_error_limit 8
root_usage_percent 0
synchronous 1
tree_depth 1


On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
<kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>> Here's some more info. I'm running kernel 3.1.4. When I do random
>> writes, the 'bypassed' number increases in stats. Now I'm random
>> writing direct to /dev/bcache0 and get the same result.
>
> Weird. From what you're describing it sounds like throttling is screwed
> up (and it was recently), but I can't reproduce it now.
>
> Can you try echoing 0 to congested_threshold_us in the cache set dir,
> and seeing if that fixes it?
>
>> There also seems to be some work needed with clean-up, since I'm
>> unfamiliar with how bcache works I attempted to make-bcache twice,
>> thinking I'd start over. That worked, but because my cache device was
>> already registered I was unable to re-register my newly formatted
>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>> don't try to register things with the same name in the same
>> directory." I was still able to use my cache device via the old uuid,
>> but this will probably cause problems on reboot. Perhaps an unregister
>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>> perhaps check for an existing superblock, ask for confirmation, and
>> give some sort instruction on how to unregister, or do it for you if
>> you reformat.
>
> Yeah, I think for some reason bcache isn't opening the devices
> exclusively on 3.1. I'll have a look...

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                             ` <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-15 23:40                               ` Marcus Sorensen
       [not found]                                 ` <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-15 23:40 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)

On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> That keeps the 'bypassed' value from increasing, but it doesn't change
> write performance.
>
> BEFORE:
> [root@sansrv2-10 stats_day]# cat *
> 27.6M
> 83
> 3500
> 0
> 166
> 24380
> 40660
> 0
>
> ...benchmarking...
>
> AFTER:
>
> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>> done 2>/dev/null
> bypassed 27.6M
> cache_bypass_hits 83
> cache_bypass_misses 3500
> cache_hit_ratio 0
> cache_hits 410
> cache_miss_collisions 48879
> cache_misses 80545
> cache_readaheads 0
>
> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>
> average_key_size 0
> block_size 2.0k
> btree_cache_size 3.2M
> bucket_size 1.0M
> cache_available_percent 100
> clear_stats congested 0
> congested_threshold_us 0
> dirty_data 0
> io_error_halflife 0
> io_error_limit 8
> root_usage_percent 0
> synchronous 1
> tree_depth 1
>
>
> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>> writing direct to /dev/bcache0 and get the same result.
>>
>> Weird. From what you're describing it sounds like throttling is screwed
>> up (and it was recently), but I can't reproduce it now.
>>
>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>> and seeing if that fixes it?
>>
>>> There also seems to be some work needed with clean-up, since I'm
>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>> thinking I'd start over. That worked, but because my cache device was
>>> already registered I was unable to re-register my newly formatted
>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>> don't try to register things with the same name in the same
>>> directory." I was still able to use my cache device via the old uuid,
>>> but this will probably cause problems on reboot. Perhaps an unregister
>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>> perhaps check for an existing superblock, ask for confirmation, and
>>> give some sort instruction on how to unregister, or do it for you if
>>> you reformat.
>>
>> Yeah, I think for some reason bcache isn't opening the devices
>> exclusively on 3.1. I'll have a look...

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                                 ` <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-16  2:17                                   ` Kent Overstreet
       [not found]                                     ` <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-16  2:17 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Sorry, I was thinking about that issue for awhile and then I got distracted...

It's not user error, it's an irritating corner case. Basically, it's
the result of a workaround for a particularly obscure data corruption
bug.

If a write bypasses the cache, it has to invalidate that region of the
cache; the null key it leaves in the cache will block cache misses
from adding that data to the cache until the btree node fills up (and
possibly splits).

It hasn't been an issue for us in normal operation, but when you're
just testing - i.e. you don't have much load - that node split may not
happen for a long time, and so if for some reason a bunch of data
bypassed the cache... well, you see what happens.

Unfortunately a better solution to the original race is not going to
be simple, so it's probably not going to be done in the very near
future. It's a _very_ difficult race to hit, but in the meantime I'd
rather lose performance than corrupt data.

But the good news is if you put normal server-ish load on it the issue
should go away in steady state operation.

On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>
> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> That keeps the 'bypassed' value from increasing, but it doesn't change
>> write performance.
>>
>> BEFORE:
>> [root@sansrv2-10 stats_day]# cat *
>> 27.6M
>> 83
>> 3500
>> 0
>> 166
>> 24380
>> 40660
>> 0
>>
>> ...benchmarking...
>>
>> AFTER:
>>
>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>> done 2>/dev/null
>> bypassed 27.6M
>> cache_bypass_hits 83
>> cache_bypass_misses 3500
>> cache_hit_ratio 0
>> cache_hits 410
>> cache_miss_collisions 48879
>> cache_misses 80545
>> cache_readaheads 0
>>
>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>
>> average_key_size 0
>> block_size 2.0k
>> btree_cache_size 3.2M
>> bucket_size 1.0M
>> cache_available_percent 100
>> clear_stats congested 0
>> congested_threshold_us 0
>> dirty_data 0
>> io_error_halflife 0
>> io_error_limit 8
>> root_usage_percent 0
>> synchronous 1
>> tree_depth 1
>>
>>
>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>> writing direct to /dev/bcache0 and get the same result.
>>>
>>> Weird. From what you're describing it sounds like throttling is screwed
>>> up (and it was recently), but I can't reproduce it now.
>>>
>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>> and seeing if that fixes it?
>>>
>>>> There also seems to be some work needed with clean-up, since I'm
>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>> thinking I'd start over. That worked, but because my cache device was
>>>> already registered I was unable to re-register my newly formatted
>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>> don't try to register things with the same name in the same
>>>> directory." I was still able to use my cache device via the old uuid,
>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>> give some sort instruction on how to unregister, or do it for you if
>>>> you reformat.
>>>
>>> Yeah, I think for some reason bcache isn't opening the devices
>>> exclusively on 3.1. I'll have a look...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                                     ` <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-16  4:28                                       ` Marcus Sorensen
       [not found]                                         ` <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-16  4:28 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Thanks! I'll put it through some more tests. I kind of figured that
something more real-world would help.

On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet-hpIqsD4AKldhl2p70BpVqQ@public.gmane.orgm> wrote:
> Sorry, I was thinking about that issue for awhile and then I got distracted...
>
> It's not user error, it's an irritating corner case. Basically, it's
> the result of a workaround for a particularly obscure data corruption
> bug.
>
> If a write bypasses the cache, it has to invalidate that region of the
> cache; the null key it leaves in the cache will block cache misses
> from adding that data to the cache until the btree node fills up (and
> possibly splits).
>
> It hasn't been an issue for us in normal operation, but when you're
> just testing - i.e. you don't have much load - that node split may not
> happen for a long time, and so if for some reason a bunch of data
> bypassed the cache... well, you see what happens.
>
> Unfortunately a better solution to the original race is not going to
> be simple, so it's probably not going to be done in the very near
> future. It's a _very_ difficult race to hit, but in the meantime I'd
> rather lose performance than corrupt data.
>
> But the good news is if you put normal server-ish load on it the issue
> should go away in steady state operation.
>
> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>>
>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote:
>>> That keeps the 'bypassed' value from increasing, but it doesn't change
>>> write performance.
>>>
>>> BEFORE:
>>> [root@sansrv2-10 stats_day]# cat *
>>> 27.6M
>>> 83
>>> 3500
>>> 0
>>> 166
>>> 24380
>>> 40660
>>> 0
>>>
>>> ...benchmarking...
>>>
>>> AFTER:
>>>
>>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>>> done 2>/dev/null
>>> bypassed 27.6M
>>> cache_bypass_hits 83
>>> cache_bypass_misses 3500
>>> cache_hit_ratio 0
>>> cache_hits 410
>>> cache_miss_collisions 48879
>>> cache_misses 80545
>>> cache_readaheads 0
>>>
>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>>
>>> average_key_size 0
>>> block_size 2.0k
>>> btree_cache_size 3.2M
>>> bucket_size 1.0M
>>> cache_available_percent 100
>>> clear_stats congested 0
>>> congested_threshold_us 0
>>> dirty_data 0
>>> io_error_halflife 0
>>> io_error_limit 8
>>> root_usage_percent 0
>>> synchronous 1
>>> tree_depth 1
>>>
>>>
>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>>> writing direct to /dev/bcache0 and get the same result.
>>>>
>>>> Weird. From what you're describing it sounds like throttling is screwed
>>>> up (and it was recently), but I can't reproduce it now.
>>>>
>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>>> and seeing if that fixes it?
>>>>
>>>>> There also seems to be some work needed with clean-up, since I'm
>>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>>> thinking I'd start over. That worked, but because my cache device was
>>>>> already registered I was unable to re-register my newly formatted
>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>>> don't try to register things with the same name in the same
>>>>> directory." I was still able to use my cache device via the old uuid,
>>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>>> give some sort instruction on how to unregister, or do it for you if
>>>>> you reformat.
>>>>
>>>> Yeah, I think for some reason bcache isn't opening the devices
>>>> exclusively on 3.1. I'll have a look...
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                                         ` <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-16 18:49                                           ` Marcus Sorensen
       [not found]                                             ` <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-16 18:49 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Actually I think this IS user error. I ran a benchmark with FIO, and
the results were practically identical with and without bcache.  I
applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it
applied cleanly I'm guessing that wiped something out. Here are my
stats after running the benchmark on bcache, and also included is the
fio config.

bypassed 32.1G
cache_bypass_hits 5482
cache_bypass_misses 194862
cache_hit_ratio 3
cache_hits 786
cache_miss_collisions 206
cache_misses 19447
cache_readaheads 0

[global]
ioengine=libaio
iodepth=4
invalidate=1 #make sure we're not cached locally
direct=1 #don't use buffers during test (test without local caches)
thread
ramp_time=20
time_based
runtime=180

[8RandomReadWriters]
rw=randrw
numjobs=8
blocksize=4k
size=1G

[2SequentialReadWriters]
rw=rw
numjobs=2
size=4G
blocksize_range=64k-1M


On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Thanks! I'll put it through some more tests. I kind of figured that
> something more real-world would help.
>
> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote:
>> Sorry, I was thinking about that issue for awhile and then I got distracted...
>>
>> It's not user error, it's an irritating corner case. Basically, it's
>> the result of a workaround for a particularly obscure data corruption
>> bug.
>>
>> If a write bypasses the cache, it has to invalidate that region of the
>> cache; the null key it leaves in the cache will block cache misses
>> from adding that data to the cache until the btree node fills up (and
>> possibly splits).
>>
>> It hasn't been an issue for us in normal operation, but when you're
>> just testing - i.e. you don't have much load - that node split may not
>> happen for a long time, and so if for some reason a bunch of data
>> bypassed the cache... well, you see what happens.
>>
>> Unfortunately a better solution to the original race is not going to
>> be simple, so it's probably not going to be done in the very near
>> future. It's a _very_ difficult race to hit, but in the meantime I'd
>> rather lose performance than corrupt data.
>>
>> But the good news is if you put normal server-ish load on it the issue
>> should go away in steady state operation.
>>
>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote:
>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>>>
>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>> That keeps the 'bypassed' value from increasing, but it doesn't change
>>>> write performance.
>>>>
>>>> BEFORE:
>>>> [root@sansrv2-10 stats_day]# cat *
>>>> 27.6M
>>>> 83
>>>> 3500
>>>> 0
>>>> 166
>>>> 24380
>>>> 40660
>>>> 0
>>>>
>>>> ...benchmarking...
>>>>
>>>> AFTER:
>>>>
>>>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>>>> done 2>/dev/null
>>>> bypassed 27.6M
>>>> cache_bypass_hits 83
>>>> cache_bypass_misses 3500
>>>> cache_hit_ratio 0
>>>> cache_hits 410
>>>> cache_miss_collisions 48879
>>>> cache_misses 80545
>>>> cache_readaheads 0
>>>>
>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>>>
>>>> average_key_size 0
>>>> block_size 2.0k
>>>> btree_cache_size 3.2M
>>>> bucket_size 1.0M
>>>> cache_available_percent 100
>>>> clear_stats congested 0
>>>> congested_threshold_us 0
>>>> dirty_data 0
>>>> io_error_halflife 0
>>>> io_error_limit 8
>>>> root_usage_percent 0
>>>> synchronous 1
>>>> tree_depth 1
>>>>
>>>>
>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>>>> writing direct to /dev/bcache0 and get the same result.
>>>>>
>>>>> Weird. From what you're describing it sounds like throttling is screwed
>>>>> up (and it was recently), but I can't reproduce it now.
>>>>>
>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>>>> and seeing if that fixes it?
>>>>>
>>>>>> There also seems to be some work needed with clean-up, since I'm
>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>>>> thinking I'd start over. That worked, but because my cache device was
>>>>>> already registered I was unable to re-register my newly formatted
>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>>>> don't try to register things with the same name in the same
>>>>>> directory." I was still able to use my cache device via the old uuid,
>>>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>>>> give some sort instruction on how to unregister, or do it for you if
>>>>>> you reformat.
>>>>>
>>>>> Yeah, I think for some reason bcache isn't opening the devices
>>>>> exclusively on 3.1. I'll have a look...
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                                             ` <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-16 18:52                                               ` Kent Overstreet
       [not found]                                                 ` <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-16 18:52 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

That's what you'd expect in writethrough mode when you aren't getting
any cache hits - try flipping on writeback and see what happens.

On Fri, Dec 16, 2011 at 10:49 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Actually I think this IS user error. I ran a benchmark with FIO, and
> the results were practically identical with and without bcache.  I
> applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it
> applied cleanly I'm guessing that wiped something out. Here are my
> stats after running the benchmark on bcache, and also included is the
> fio config.
>
> bypassed 32.1G
> cache_bypass_hits 5482
> cache_bypass_misses 194862
> cache_hit_ratio 3
> cache_hits 786
> cache_miss_collisions 206
> cache_misses 19447
> cache_readaheads 0
>
> [global]
> ioengine=libaio
> iodepth=4
> invalidate=1 #make sure we're not cached locally
> direct=1 #don't use buffers during test (test without local caches)
> thread
> ramp_time=20
> time_based
> runtime=180
>
> [8RandomReadWriters]
> rw=randrw
> numjobs=8
> blocksize=4k
> size=1G
>
> [2SequentialReadWriters]
> rw=rw
> numjobs=2
> size=4G
> blocksize_range=64k-1M
>
>
> On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Thanks! I'll put it through some more tests. I kind of figured that
>> something more real-world would help.
>>
>> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote:
>>> Sorry, I was thinking about that issue for awhile and then I got distracted...
>>>
>>> It's not user error, it's an irritating corner case. Basically, it's
>>> the result of a workaround for a particularly obscure data corruption
>>> bug.
>>>
>>> If a write bypasses the cache, it has to invalidate that region of the
>>> cache; the null key it leaves in the cache will block cache misses
>>> from adding that data to the cache until the btree node fills up (and
>>> possibly splits).
>>>
>>> It hasn't been an issue for us in normal operation, but when you're
>>> just testing - i.e. you don't have much load - that node split may not
>>> happen for a long time, and so if for some reason a bunch of data
>>> bypassed the cache... well, you see what happens.
>>>
>>> Unfortunately a better solution to the original race is not going to
>>> be simple, so it's probably not going to be done in the very near
>>> future. It's a _very_ difficult race to hit, but in the meantime I'd
>>> rather lose performance than corrupt data.
>>>
>>> But the good news is if you put normal server-ish load on it the issue
>>> should go away in steady state operation.
>>>
>>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>>>>
>>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>>> That keeps the 'bypassed' value from increasing, but it doesn't change
>>>>> write performance.
>>>>>
>>>>> BEFORE:
>>>>> [root@sansrv2-10 stats_day]# cat *
>>>>> 27.6M
>>>>> 83
>>>>> 3500
>>>>> 0
>>>>> 166
>>>>> 24380
>>>>> 40660
>>>>> 0
>>>>>
>>>>> ...benchmarking...
>>>>>
>>>>> AFTER:
>>>>>
>>>>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>>>>> done 2>/dev/null
>>>>> bypassed 27.6M
>>>>> cache_bypass_hits 83
>>>>> cache_bypass_misses 3500
>>>>> cache_hit_ratio 0
>>>>> cache_hits 410
>>>>> cache_miss_collisions 48879
>>>>> cache_misses 80545
>>>>> cache_readaheads 0
>>>>>
>>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>>>>
>>>>> average_key_size 0
>>>>> block_size 2.0k
>>>>> btree_cache_size 3.2M
>>>>> bucket_size 1.0M
>>>>> cache_available_percent 100
>>>>> clear_stats congested 0
>>>>> congested_threshold_us 0
>>>>> dirty_data 0
>>>>> io_error_halflife 0
>>>>> io_error_limit 8
>>>>> root_usage_percent 0
>>>>> synchronous 1
>>>>> tree_depth 1
>>>>>
>>>>>
>>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>>>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>>>>> writing direct to /dev/bcache0 and get the same result.
>>>>>>
>>>>>> Weird. From what you're describing it sounds like throttling is screwed
>>>>>> up (and it was recently), but I can't reproduce it now.
>>>>>>
>>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>>>>> and seeing if that fixes it?
>>>>>>
>>>>>>> There also seems to be some work needed with clean-up, since I'm
>>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>>>>> thinking I'd start over. That worked, but because my cache device was
>>>>>>> already registered I was unable to re-register my newly formatted
>>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>>>>> don't try to register things with the same name in the same
>>>>>>> directory." I was still able to use my cache device via the old uuid,
>>>>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>>>>> give some sort instruction on how to unregister, or do it for you if
>>>>>>> you reformat.
>>>>>>
>>>>>> Yeah, I think for some reason bcache isn't opening the devices
>>>>>> exclusively on 3.1. I'll have a look...
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                                                 ` <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-16 22:45                                                   ` Marcus Sorensen
       [not found]                                                     ` <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Sorensen @ 2011-12-16 22:45 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Yeah, I echoed 1 into writeback before doing the test. And why
wouldn't I get any cache hits?

On Fri, Dec 16, 2011 at 11:52 AM, Kent Overstreet
<koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> That's what you'd expect in writethrough mode when you aren't getting
> any cache hits - try flipping on writeback and see what happens.
>
> On Fri, Dec 16, 2011 at 10:49 AM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote:
>> Actually I think this IS user error. I ran a benchmark with FIO, and
>> the results were practically identical with and without bcache.  I
>> applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it
>> applied cleanly I'm guessing that wiped something out. Here are my
>> stats after running the benchmark on bcache, and also included is the
>> fio config.
>>
>> bypassed 32.1G
>> cache_bypass_hits 5482
>> cache_bypass_misses 194862
>> cache_hit_ratio 3
>> cache_hits 786
>> cache_miss_collisions 206
>> cache_misses 19447
>> cache_readaheads 0
>>
>> [global]
>> ioengine=libaio
>> iodepth=4
>> invalidate=1 #make sure we're not cached locally
>> direct=1 #don't use buffers during test (test without local caches)
>> thread
>> ramp_time=20
>> time_based
>> runtime=180
>>
>> [8RandomReadWriters]
>> rw=randrw
>> numjobs=8
>> blocksize=4k
>> size=1G
>>
>> [2SequentialReadWriters]
>> rw=rw
>> numjobs=2
>> size=4G
>> blocksize_range=64k-1M
>>
>>
>> On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8@public.gmane.orgm> wrote:
>>> Thanks! I'll put it through some more tests. I kind of figured that
>>> something more real-world would help.
>>>
>>> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote:
>>>> Sorry, I was thinking about that issue for awhile and then I got distracted...
>>>>
>>>> It's not user error, it's an irritating corner case. Basically, it's
>>>> the result of a workaround for a particularly obscure data corruption
>>>> bug.
>>>>
>>>> If a write bypasses the cache, it has to invalidate that region of the
>>>> cache; the null key it leaves in the cache will block cache misses
>>>> from adding that data to the cache until the btree node fills up (and
>>>> possibly splits).
>>>>
>>>> It hasn't been an issue for us in normal operation, but when you're
>>>> just testing - i.e. you don't have much load - that node split may not
>>>> happen for a long time, and so if for some reason a bunch of data
>>>> bypassed the cache... well, you see what happens.
>>>>
>>>> Unfortunately a better solution to the original race is not going to
>>>> be simple, so it's probably not going to be done in the very near
>>>> future. It's a _very_ difficult race to hit, but in the meantime I'd
>>>> rather lose performance than corrupt data.
>>>>
>>>> But the good news is if you put normal server-ish load on it the issue
>>>> should go away in steady state operation.
>>>>
>>>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>>>>>
>>>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>>>> That keeps the 'bypassed' value from increasing, but it doesn't change
>>>>>> write performance.
>>>>>>
>>>>>> BEFORE:
>>>>>> [root@sansrv2-10 stats_day]# cat *
>>>>>> 27.6M
>>>>>> 83
>>>>>> 3500
>>>>>> 0
>>>>>> 166
>>>>>> 24380
>>>>>> 40660
>>>>>> 0
>>>>>>
>>>>>> ...benchmarking...
>>>>>>
>>>>>> AFTER:
>>>>>>
>>>>>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>>>>>> done 2>/dev/null
>>>>>> bypassed 27.6M
>>>>>> cache_bypass_hits 83
>>>>>> cache_bypass_misses 3500
>>>>>> cache_hit_ratio 0
>>>>>> cache_hits 410
>>>>>> cache_miss_collisions 48879
>>>>>> cache_misses 80545
>>>>>> cache_readaheads 0
>>>>>>
>>>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>>>>>
>>>>>> average_key_size 0
>>>>>> block_size 2.0k
>>>>>> btree_cache_size 3.2M
>>>>>> bucket_size 1.0M
>>>>>> cache_available_percent 100
>>>>>> clear_stats congested 0
>>>>>> congested_threshold_us 0
>>>>>> dirty_data 0
>>>>>> io_error_halflife 0
>>>>>> io_error_limit 8
>>>>>> root_usage_percent 0
>>>>>> synchronous 1
>>>>>> tree_depth 1
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>>>>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>>>>>> writing direct to /dev/bcache0 and get the same result.
>>>>>>>
>>>>>>> Weird. From what you're describing it sounds like throttling is screwed
>>>>>>> up (and it was recently), but I can't reproduce it now.
>>>>>>>
>>>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>>>>>> and seeing if that fixes it?
>>>>>>>
>>>>>>>> There also seems to be some work needed with clean-up, since I'm
>>>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>>>>>> thinking I'd start over. That worked, but because my cache device was
>>>>>>>> already registered I was unable to re-register my newly formatted
>>>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>>>>>> don't try to register things with the same name in the same
>>>>>>>> directory." I was still able to use my cache device via the old uuid,
>>>>>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>>>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>>>>>> give some sort instruction on how to unregister, or do it for you if
>>>>>>>> you reformat.
>>>>>>>
>>>>>>> Yeah, I think for some reason bcache isn't opening the devices
>>>>>>> exclusively on 3.1. I'll have a look...
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Quick bcache benchmark
       [not found]                                                     ` <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-16 23:33                                                       ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-12-16 23:33 UTC (permalink / raw)
  To: Marcus Sorensen; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Sounds like the cache isn't getting populated for some reason. Have
you tried disabling throttling? echo 0 > congested_threshhold_us

With that off and writeback on you really ought to get some
performance improvement on random writes...

On Fri, Dec 16, 2011 at 2:45 PM, Marcus Sorensen <shadowsor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Yeah, I echoed 1 into writeback before doing the test. And why
> wouldn't I get any cache hits?
>
> On Fri, Dec 16, 2011 at 11:52 AM, Kent Overstreet
> <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> That's what you'd expect in writethrough mode when you aren't getting
>> any cache hits - try flipping on writeback and see what happens.
>>
>> On Fri, Dec 16, 2011 at 10:49 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>> Actually I think this IS user error. I ran a benchmark with FIO, and
>>> the results were practically identical with and without bcache.  I
>>> applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it
>>> applied cleanly I'm guessing that wiped something out. Here are my
>>> stats after running the benchmark on bcache, and also included is the
>>> fio config.
>>>
>>> bypassed 32.1G
>>> cache_bypass_hits 5482
>>> cache_bypass_misses 194862
>>> cache_hit_ratio 3
>>> cache_hits 786
>>> cache_miss_collisions 206
>>> cache_misses 19447
>>> cache_readaheads 0
>>>
>>> [global]
>>> ioengine=libaio
>>> iodepth=4
>>> invalidate=1 #make sure we're not cached locally
>>> direct=1 #don't use buffers during test (test without local caches)
>>> thread
>>> ramp_time=20
>>> time_based
>>> runtime=180
>>>
>>> [8RandomReadWriters]
>>> rw=randrw
>>> numjobs=8
>>> blocksize=4k
>>> size=1G
>>>
>>> [2SequentialReadWriters]
>>> rw=rw
>>> numjobs=2
>>> size=4G
>>> blocksize_range=64k-1M
>>>
>>>
>>> On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>> Thanks! I'll put it through some more tests. I kind of figured that
>>>> something more real-world would help.
>>>>
>>>> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@google.com> wrote:
>>>>> Sorry, I was thinking about that issue for awhile and then I got distracted...
>>>>>
>>>>> It's not user error, it's an irritating corner case. Basically, it's
>>>>> the result of a workaround for a particularly obscure data corruption
>>>>> bug.
>>>>>
>>>>> If a write bypasses the cache, it has to invalidate that region of the
>>>>> cache; the null key it leaves in the cache will block cache misses
>>>>> from adding that data to the cache until the btree node fills up (and
>>>>> possibly splits).
>>>>>
>>>>> It hasn't been an issue for us in normal operation, but when you're
>>>>> just testing - i.e. you don't have much load - that node split may not
>>>>> happen for a long time, and so if for some reason a bunch of data
>>>>> bypassed the cache... well, you see what happens.
>>>>>
>>>>> Unfortunately a better solution to the original race is not going to
>>>>> be simple, so it's probably not going to be done in the very near
>>>>> future. It's a _very_ difficult race to hit, but in the meantime I'd
>>>>> rather lose performance than corrupt data.
>>>>>
>>>>> But the good news is if you put normal server-ish load on it the issue
>>>>> should go away in steady state operation.
>>>>>
>>>>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>>>>>>
>>>>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>>>>> That keeps the 'bypassed' value from increasing, but it doesn't change
>>>>>>> write performance.
>>>>>>>
>>>>>>> BEFORE:
>>>>>>> [root@sansrv2-10 stats_day]# cat *
>>>>>>> 27.6M
>>>>>>> 83
>>>>>>> 3500
>>>>>>> 0
>>>>>>> 166
>>>>>>> 24380
>>>>>>> 40660
>>>>>>> 0
>>>>>>>
>>>>>>> ...benchmarking...
>>>>>>>
>>>>>>> AFTER:
>>>>>>>
>>>>>>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>>>>>>> done 2>/dev/null
>>>>>>> bypassed 27.6M
>>>>>>> cache_bypass_hits 83
>>>>>>> cache_bypass_misses 3500
>>>>>>> cache_hit_ratio 0
>>>>>>> cache_hits 410
>>>>>>> cache_miss_collisions 48879
>>>>>>> cache_misses 80545
>>>>>>> cache_readaheads 0
>>>>>>>
>>>>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>>>>>>
>>>>>>> average_key_size 0
>>>>>>> block_size 2.0k
>>>>>>> btree_cache_size 3.2M
>>>>>>> bucket_size 1.0M
>>>>>>> cache_available_percent 100
>>>>>>> clear_stats congested 0
>>>>>>> congested_threshold_us 0
>>>>>>> dirty_data 0
>>>>>>> io_error_halflife 0
>>>>>>> io_error_limit 8
>>>>>>> root_usage_percent 0
>>>>>>> synchronous 1
>>>>>>> tree_depth 1
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>>>>>>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>>>>>>> writing direct to /dev/bcache0 and get the same result.
>>>>>>>>
>>>>>>>> Weird. From what you're describing it sounds like throttling is screwed
>>>>>>>> up (and it was recently), but I can't reproduce it now.
>>>>>>>>
>>>>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>>>>>>> and seeing if that fixes it?
>>>>>>>>
>>>>>>>>> There also seems to be some work needed with clean-up, since I'm
>>>>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>>>>>>> thinking I'd start over. That worked, but because my cache device was
>>>>>>>>> already registered I was unable to re-register my newly formatted
>>>>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>>>>>>> don't try to register things with the same name in the same
>>>>>>>>> directory." I was still able to use my cache device via the old uuid,
>>>>>>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>>>>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>>>>>>> give some sort instruction on how to unregister, or do it for you if
>>>>>>>>> you reformat.
>>>>>>>>
>>>>>>>> Yeah, I think for some reason bcache isn't opening the devices
>>>>>>>> exclusively on 3.1. I'll have a look...
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-12-16 23:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-06  8:22 Quick bcache benchmark Kent Overstreet
2011-12-06  8:22 ` Kent Overstreet
     [not found] ` <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw@mail.gmail.com>
     [not found]   ` <CAEp_DRCHQo1JyPZk6dKYZjJvxtaR7yxpEDtGE+uYK9n2dNb2Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06 11:56     ` Kent Overstreet
2011-12-06 14:10       ` Bostjan Skufca
     [not found]         ` <CAEp_DRDEQLSkJ3arx81qM1M4iSJ5Wy0dwZhsrYD=94682qw8JQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06 17:02           ` Marcus Sorensen
     [not found]             ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw@mail.gmail.com>
     [not found]               ` <CALFpzo6ugO-5KHvrszp0bAYHY9eT8ADebbBqwgM3Y9FRS7PnGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09 10:02                 ` Kent Overstreet
     [not found]                   ` <CAC7rs0vvJbN6iOvvKJ3Xgm5BAzBxBYL+e6_F_ZzfREEbnC9-CA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09 17:09                     ` Marcus Sorensen
     [not found]                       ` <CALFpzo6kXrC+8kkqrtRuMcsqnRL-oPc+B3A-Vq3wkWhRLBbAJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09 17:14                         ` Marcus Sorensen
2011-12-10  6:33                         ` Kent Overstreet
2011-12-10 15:02                           ` Marcus Sorensen
     [not found]                             ` <CALFpzo71TRvx59U6n7xkd_DNejQrD9qj1tuOeir3w6NaT79bCA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-15 23:40                               ` Marcus Sorensen
     [not found]                                 ` <CALFpzo542=jHj5OB3qCSKCAvmig6t85VDhnuc++toO0O=z7brQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16  2:17                                   ` Kent Overstreet
     [not found]                                     ` <CAH+dOx+r7L2o9RSCdXsa0Nn+k=Ab9QXc60gBb7Mhb+huhcOQ1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16  4:28                                       ` Marcus Sorensen
     [not found]                                         ` <CALFpzo6-cpEqxAy5p7rje_CR08PE94Cbju==yRktQ_8s7dN4QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 18:49                                           ` Marcus Sorensen
     [not found]                                             ` <CALFpzo6r8YGXtUtTOua=nw0Nw_+FEdL71+JEwb65LeRkyuTGZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 18:52                                               ` Kent Overstreet
     [not found]                                                 ` <CAH+dOxJio6xJ-MkRkeJ34v+BEsBek5=iOz6bTjUuW8s4LwK5RQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 22:45                                                   ` Marcus Sorensen
     [not found]                                                     ` <CALFpzo5rehRqabN=2C11eLTyr6khvBRwX1JaJGNdkguMs-Fueg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-16 23:33                                                       ` Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.