New version up with fix for md and other block devices

All of lore.kernel.org
 help / color / mirror / Atom feed

* New version up with fix for md and other block devices
@ 2011-11-21 10:14 Kent Overstreet
       [not found] ` <CAOzFzEjdhWtS9Q538+rM6LJm0ncx_MZg++3TCag3jr68F2=1uA@mail.gmail.com>
       [not found] ` <20111121101402.GA17787-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
  0 siblings, 2 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-11-21 10:14 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

I just pushed a new version, and it's only been lightly tested but
assuming I haven't screwed anything up it should work on
md/dm/rados/iscsi/etc. block devices.

I've only tested backing devices on raid devices; awhile back someone
ran into a bug when the cache device was a raid1. I'm not sure if it
ever got fixed, and it took awhile to reproduce - so if anyone feels
like testing it please let me know.

There were also a pair of very slow silent data corruption bugs found
and fixed a few weeks ago; I updated the public repository with the
fixes for those as soon as I had them but I was negligent in posting
about it. So, the new code's better and when we discovered those bugs we
added some more extensive verification code, I don't expect to find any
more corruption bugs (fingers crossed).

But anyways, if anyone runs into bugs with this version please let me
know. I know the documentation is horribly out of date, sorry about that
and hopefully I'll have time to work on that soon.

Hopefully I'll be able to post some real benchmarks soon too, it has
gotten phenomenally fast...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]   ` <CAOzFzEjdhWtS9Q538+rM6LJm0ncx_MZg++3TCag3jr68F2=1uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-21 10:26     ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-11-21 10:26 UTC (permalink / raw)
  To: Joseph Glanville; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Mon, Nov 21, 2011 at 2:18 AM, Joseph Glanville
<joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org> wrote:
>
> Hi Kent,
>
> Great news!
>
> Just wondering if the bcache tree has been updated or will apply cleanly against 3.1?

It has, it's been rebased on top of v3.1.

> I will test on some more exotic stuff (iSCSI, md devices, other custom block devices etc) and get back to you if I find bugs.
> Are bugs best reported here on the list or is there another best way of doing so?

Cool! The list and this email are best.

> Willing to help out with some benchmarks on fast equipment that should be able to stress bcache abit.

Benchmarks are definitely welcome, the only benchmarking I've had time
to do has been for specifically working on performance.

Also, I mostly develop and test on 2.6.34 - I've done some stress
testing of the 3.1 version but no performance testing, so it is
possible there's performance bugs in the 3.1 version.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found] ` <20111121101402.GA17787-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
@ 2011-11-29  6:10   ` Brad Campbell
       [not found]     ` <4ED47771.9030309-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Brad Campbell @ 2011-11-29  6:10 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 21/11/11 18:14, Kent Overstreet wrote:
> I just pushed a new version, and it's only been lightly tested but
> assuming I haven't screwed anything up it should work on
> md/dm/rados/iscsi/etc. block devices.
>

Great stuff! A couple of nitpicks while I get things set up to run some 
tests..

Documentation/bcache.txt states :
<--------->
To register your bcache devices automatically, you could add something like
this to an init script:
   echo /dev/sd* > /sys/fs/bcache/register_quiet

It'll look for bcache superblocks and ignore everything that doesn't 
have one.
<--------->

However this never works for me. It bombs out on the first passed parameter

root@test:~/bin# echo /dev/sd* /dev/md* > /sys/fs/bcache/register_quiet
bash: echo: write error: Invalid argument

I need to use this :
for i in /dev/sd? /dev/md* ; do [ -n "`/sbin/probe-bcache $i`" ] && echo 
$i > /sys/fs/bcache/register_quiet ; done

Now, it does not actually need the test in there, however that stops it 
spewing "write error: Invalid argument" onto the console when you echo a 
device that does not have a bcache superblock.

It does NOT like you accidentally trying to register a device twice :

[   42.327890] ------------[ cut here ]------------
[   42.327994] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xb9/0xf0()
[   42.328042] Hardware name: To Be Filled By O.E.M.
[   42.328085] sysfs: cannot create duplicate filename 
'/devices/virtual/block/md10/bcache'
[   42.328132] Modules linked in: nfs ipt_MASQUERADE xt_tcpudp 
iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack 
nf_defrag_ipv4 ip_tables x_tables deflate zlib_deflate des_generic cbc 
ecb crypto_blkcipher sha1_generic md5 hmac crypto_hash cryptomgr aead 
crypto_algapi af_key fuse w83627ehf hwmon_vid netconsole configfs 
vhost_net powernow_k8 mperf kvm_amd kvm xhci_hcd k10temp i2c_piix4 
ohci_hcd ehci_hcd ahci usbcore libahci atl1c megaraid_sas [last 
unloaded: scsi_wait_scan]
[   42.329736] Pid: 3301, comm: bash Not tainted 3.1.0-g143cdea #1
[   42.329779] Call Trace:
[   42.329826]  [<ffffffff81034eeb>] ? warn_slowpath_common+0x7b/0xc0
[   42.329874]  [<ffffffff81034fe5>] ? warn_slowpath_fmt+0x45/0x50
[   42.329922]  [<ffffffff811210c9>] ? sysfs_add_one+0xb9/0xf0
[   42.329968]  [<ffffffff81121b69>] ? create_dir+0x79/0xe0
[   42.330048]  [<ffffffff81121c42>] ? sysfs_create_dir+0x72/0xb0
[   42.330099]  [<ffffffff811dc18f>] ? kobject_add_internal+0xaf/0x1e0
[   42.330149]  [<ffffffff811dc4c6>] ? kobject_add+0x46/0x70
[   42.330201]  [<ffffffff810995b0>] ? bdi_init+0x170/0x1c0
[   42.330247]  [<ffffffff811dbd3d>] ? kobject_init+0x2d/0xb0
[   42.330296]  [<ffffffff812a435d>] ? register_bcache+0x72d/0xac0
[   42.330344]  [<ffffffff81120222>] ? sysfs_write_file+0xd2/0x160
[   42.330391]  [<ffffffff810c7348>] ? vfs_write+0xc8/0x190
[   42.330436]  [<ffffffff810c750e>] ? sys_write+0x4e/0x90
[   42.330481]  [<ffffffff8140a87b>] ? system_call_fastpath+0x16/0x1b
[   42.330526] ---[ end trace 1183eef7ce845ca5 ]---
[   42.330573] kobject_add_internal failed for bcache with -EEXIST, 
don't try to register things with the same name in the same directory.
[   42.330628] Pid: 3301, comm: bash Tainted: G        W   3.1.0-g143cdea #1
[   42.330673] Call Trace:
[   42.330715]  [<ffffffff811dc22a>] ? kobject_add_internal+0x14a/0x1e0
[   42.330762]  [<ffffffff811dc4c6>] ? kobject_add+0x46/0x70
[   42.330808]  [<ffffffff810995b0>] ? bdi_init+0x170/0x1c0
[   42.330854]  [<ffffffff811dbd3d>] ? kobject_init+0x2d/0xb0
[   42.330901]  [<ffffffff812a435d>] ? register_bcache+0x72d/0xac0
[   42.330949]  [<ffffffff81120222>] ? sysfs_write_file+0xd2/0x160
[   42.330996]  [<ffffffff810c7348>] ? vfs_write+0xc8/0x190
[   42.331043]  [<ffffffff810c750e>] ? sys_write+0x4e/0x90
[   42.331093]  [<ffffffff8140a87b>] ? system_call_fastpath+0x16/0x1b
[   42.331153] bcache: Device md10 unregistered
[   47.909107] device vnet0 entered promiscuous mode
[   47.913482] br1: port 2(vnet0) entering forwarding state
[   47.913557] br1: port 2(vnet0) entering forwarding state
[   47.946957] br1: port 2(vnet0) entering forwarding state
[   47.947316] device vnet0 left promiscuous mode
[   47.947415] br1: port 2(vnet0) entering disabled state
[   48.174405] device vnet0 entered promiscuous mode
[   48.178681] br1: port 2(vnet0) entering forwarding state
[   48.178749] br1: port 2(vnet0) entering forwarding state
[   48.212849] br1: port 2(vnet0) entering forwarding state
[   48.213173] device vnet0 left promiscuous mode
[   48.213271] br1: port 2(vnet0) entering disabled state
[   48.910124] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000018
[   48.910256] IP: [<ffffffff81041588>] get_next_timer_interrupt+0x138/0x250
[   48.910342] PGD 41df6d067 PUD 414205067 PMD 0
[   48.910482] Oops: 0000 [#1] SMP
[   48.910586] CPU 1
[   48.910622] Modules linked in: xt_state ipt_REJECT xt_CHECKSUM 
iptable_mangle nfs ipt_MASQUERADE xt_tcpudp iptable_filter iptable_nat 
nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables 
deflate zlib_deflate des_generic cbc ecb crypto_blkcipher sha1_generic 
md5 hmac crypto_hash cryptomgr aead crypto_algapi af_key fuse w83627ehf 
hwmon_vid netconsole configfs vhost_net powernow_k8 mperf kvm_amd kvm 
xhci_hcd k10temp i2c_piix4 ohci_hcd ehci_hcd ahci usbcore libahci atl1c 
megaraid_sas [last unloaded: scsi_wait_scan]
[   48.912398]
[   48.912438] Pid: 0, comm: kworker/0:0 Tainted: G        W 
3.1.0-g143cdea #1 To Be Filled By O.E.M. To Be Filled By O.E.M./890GX 
Extreme4 R2.0
[   48.912589] RIP: 0010:[<ffffffff81041588>]  [<ffffffff81041588>] 
get_next_timer_interrupt+0x138/0x250
[   48.912675] RSP: 0018:ffff88041dc9fe78  EFLAGS: 00010003
[   48.912717] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
ffff88041dc99220
[   48.912762] RDX: 0000000000000001 RSI: 0000000000000020 RDI: 
ffff88041dc99020
[   48.912807] RBP: 00000000ffff9deb R08: 000000000000001e R09: 
0000000000ffff9e
[   48.912852] R10: ffff88041dc9fe90 R11: ffff88041dc9fea8 R12: 
ffff88041dc98000
[   48.912896] R13: 0000000000000040 R14: ffff88042fc4c480 R15: 
00000000ffff9deb
[   48.912942] FS:  00007f2a2b4f17e0(0000) GS:ffff88042fc40000(0000) 
knlGS:0000000000000000
[   48.912991] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   48.913033] CR2: 0000000000000018 CR3: 00000004142bd000 CR4: 
00000000000006e0
[   48.913078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   48.913121] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[   48.913167] Process kworker/0:0 (pid: 0, threadinfo ffff88041dc9e000, 
task ffff88041dc6be80)
[   48.913214] Stack:
[   48.913253]  0000000000000000 ffff88042fc4cbe0 ffff88041dc99020 
ffff88041dc99420
[   48.913427]  ffff88041dc99820 ffff88041dc99c20 ffff88042fc4cb00 
ffff88042fc4cbe0
[   48.913599]  0000000000000001 0000000000000286 0000000b6344e813 
ffffffff8105f371
[   48.913772] Call Trace:
[   48.913818]  [<ffffffff8105f371>] ? tick_nohz_stop_sched_tick+0x2d1/0x3d0
[   48.913867]  [<ffffffff8100167f>] ? cpu_idle+0x2f/0xc0
[   48.913909] Code: 00 00 48 89 44 24 28 45 89 c8 41 83 e0 3f 44 89 c6 
66 90 48 63 ce 48 c1 e1 04 48 8b 04 39 48 8d 0c 0f 48 39 c8 74 22 0f 1f 
40 00 <f6> 40 18 01 75 10 48 8b 50 10 48 39 da 48 0f 48 da ba 01 00 00
[   48.916028] RIP  [<ffffffff81041588>] 
get_next_timer_interrupt+0x138/0x250
[   48.916107]  RSP <ffff88041dc9fe78>
[   48.916146] CR2: 0000000000000018
[   48.916187] ---[ end trace 1183eef7ce845ca6 ]---
[   48.916229] Kernel panic - not syncing: Fatal exception
[   48.916273] Pid: 0, comm: kworker/0:0 Tainted: G      D W 
3.1.0-g143cdea #1
[   48.916318] Call Trace:
[   48.916368]  [<ffffffff81407679>] ? panic+0x92/0x193
[   48.916414]  [<ffffffff81035451>] ? kmsg_dump+0x41/0xf0
[   48.916461]  [<ffffffff8100504d>] ? oops_end+0x8d/0xa0
[   48.916507]  [<ffffffff81020c7b>] ? no_context+0xfb/0x260
[   48.916553]  [<ffffffff810214f9>] ? do_page_fault+0x2b9/0x430
[   48.916599]  [<ffffffff81031cda>] ? load_balance+0x8a/0x5b0
[   48.916644]  [<ffffffff8140a46f>] ? page_fault+0x1f/0x30
[   48.916690]  [<ffffffff81041588>] ? get_next_timer_interrupt+0x138/0x250
[   48.916736]  [<ffffffff8104148a>] ? get_next_timer_interrupt+0x3a/0x250
[   48.916783]  [<ffffffff8105f371>] ? tick_nohz_stop_sched_tick+0x2d1/0x3d0
[   48.916830]  [<ffffffff8100167f>] ? cpu_idle+0x2f/0xc0
[   48.916907] Rebooting in 10 seconds..

I had a bug in my init script that inadvertently registered /dev/md10 
twice. I had about a 6 second window to ssh into the machine and disable 
the init script as it just sat there in a boot/panic/reboot loop.

This happens immediately I try and attach a cache set to /dev/md10 :

[   73.287556] md: resync of RAID array md10
[   73.287614] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   73.287620] ------------[ cut here ]------------
[   73.287628] kernel BUG at drivers/scsi/scsi_lib.c:1152!
[   73.287633] invalid opcode: 0000 [#1] SMP
[   73.287638] CPU 1
[   73.287641] Modules linked in: xt_state ipt_REJECT xt_CHECKSUM 
iptable_mangle nfs ipt_MASQUERADE xt_tcpudp iptable_filter iptable_nat 
nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables 
deflate zlib_deflate des_generic cbc ecb crypto_blkcipher sha1_generic 
md5 hmac crypto_hash cryptomgr aead crypto_algapi af_key fuse w83627ehf 
hwmon_vid netconsole configfs vhost_net powernow_k8 mperf kvm_amd kvm 
xhci_hcd k10temp i2c_piix4 ohci_hcd ehci_hcd usbcore ahci atl1c libahci 
megaraid_sas [last unloaded: scsi_wait_scan]
[   73.287694]
[   73.287700] Pid: 1428, comm: md10_raid10 Not tainted 3.1.0-g143cdea 
#1 To Be Filled By O.E.M. To Be Filled By O.E.M./890GX Extreme4 R2.0
[   73.287711] RIP: 0010:[<ffffffff812adbfe>]  [<ffffffff812adbfe>] 
scsi_setup_fs_cmnd+0xae/0xf0
[   73.287726] RSP: 0018:ffff88041bb73be0  EFLAGS: 00010046
[   73.287731] RAX: 0000000000000000 RBX: ffff88041ba5c560 RCX: 
0000000000001000
[   73.287736] RDX: 0000000000000000 RSI: ffff88041ba5c560 RDI: 
ffff88041bbf8000
[   73.287740] RBP: ffff88041bbf8000 R08: 0000000000000000 R09: 
0000000000000001
[   73.287745] R10: 4080000000000000 R11: dead000000100100 R12: 
ffff88041bbf8000
[   73.287750] R13: 0000000000000808 R14: ffff88041bbf8048 R15: 
ffff88041c029400
[   73.287756] FS:  00007fa5803717c0(0000) GS:ffff88042fc40000(0000) 
knlGS:0000000000000000
[   73.287761] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   73.287766] CR2: 0000000001b7c1b8 CR3: 0000000418a24000 CR4: 
00000000000006e0
[   73.287770] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   73.287774] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[   73.287780] Process md10_raid10 (pid: 1428, threadinfo 
ffff88041bb72000, task ffff88041c760640)
[   73.287784] Stack:
[   73.287787]  0000000000000000 ffff88041ba5c560 ffff88041c603c30 
ffffffff812b86fc
[   73.287794]  01ff88041bbf8800 ffff880400000001 ffff880400001000 
0000000000000000
[   73.287800]  ffff88041ba5c560 ffff88041ba5c560 ffff88041ba5c560 
ffff88041c603c30
[   73.287807] Call Trace:
[   73.287816]  [<ffffffff812b86fc>] ? sd_prep_fn+0x15c/0xa50
[   73.287825]  [<ffffffff811c8be4>] ? blk_peek_request+0xb4/0x1d0
[   73.287832]  [<ffffffff812ad120>] ? scsi_request_fn+0x50/0x4a0
[   73.287840]  [<ffffffff811c9598>] ? blk_flush_plug_list+0x188/0x210
[   73.287847]  [<ffffffff811c962b>] ? blk_finish_plug+0xb/0x30
[   73.287854]  [<ffffffff812fae78>] ? raid10d+0x908/0xb50
[   73.287862]  [<ffffffff81041183>] ? lock_timer_base+0x33/0x70
[   73.287870]  [<ffffffff814089c5>] ? schedule_timeout+0x1c5/0x230
[   73.287878]  [<ffffffff8130ec8f>] ? md_thread+0x10f/0x140
[   73.287886]  [<ffffffff81050240>] ? wake_up_bit+0x40/0x40
[   73.287892]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
[   73.287898]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
[   73.287905]  [<ffffffff8104fde6>] ? kthread+0x96/0xa0
[   73.287912]  [<ffffffff8140bbf4>] ? kernel_thread_helper+0x4/0x10
[   73.287920]  [<ffffffff8104fd50>] ? kthread_worker_fn+0x120/0x120
[   73.287926]  [<ffffffff8140bbf0>] ? gs_change+0xb/0xb
[   73.287929] Code: 80 00 00 00 00 48 83 c4 08 5b 5d c3 90 48 89 ef be 
20 00 00 00 e8 73 a2 ff ff 48 85 c0 48 89 c7 74 d7 48 89 83 d8 00 00 00 
eb 8d <0f> 0b eb fe 48 8b 00 48 85 c0 0f 84 67 ff ff ff 48 8b 40 50 48
[   73.287969] RIP  [<ffffffff812adbfe>] scsi_setup_fs_cmnd+0xae/0xf0
[   73.287977]  RSP <ffff88041bb73be0>
[   73.287982] ---[ end trace 4ce4e575167cc0ff ]---
[   73.287986] Kernel panic - not syncing: Fatal exception
[   73.287993] Pid: 1428, comm: md10_raid10 Tainted: G      D 
3.1.0-g143cdea #1
[   73.287998] Call Trace:
[   73.288005]  [<ffffffff81407679>] ? panic+0x92/0x193
[   73.288012]  [<ffffffff81035451>] ? kmsg_dump+0x41/0xf0
[   73.288021]  [<ffffffff8100504d>] ? oops_end+0x8d/0xa0
[   73.288028]  [<ffffffff81002e34>] ? do_invalid_op+0x84/0xa0
[   73.288035]  [<ffffffff812adbfe>] ? scsi_setup_fs_cmnd+0xae/0xf0
[   73.288044]  [<ffffffff811d698e>] ? cfq_set_request+0x15e/0x3b0
[   73.288050]  [<ffffffff8140ba75>] ? invalid_op+0x15/0x20
[   73.288058]  [<ffffffff812adbfe>] ? scsi_setup_fs_cmnd+0xae/0xf0
[   73.288064]  [<ffffffff812b86fc>] ? sd_prep_fn+0x15c/0xa50
[   73.288071]  [<ffffffff811c8be4>] ? blk_peek_request+0xb4/0x1d0
[   73.288078]  [<ffffffff812ad120>] ? scsi_request_fn+0x50/0x4a0
[   73.288085]  [<ffffffff811c9598>] ? blk_flush_plug_list+0x188/0x210
[   73.288092]  [<ffffffff811c962b>] ? blk_finish_plug+0xb/0x30
[   73.288098]  [<ffffffff812fae78>] ? raid10d+0x908/0xb50
[   73.288104]  [<ffffffff81041183>] ? lock_timer_base+0x33/0x70
[   73.288112]  [<ffffffff814089c5>] ? schedule_timeout+0x1c5/0x230
[   73.288119]  [<ffffffff8130ec8f>] ? md_thread+0x10f/0x140
[   73.288126]  [<ffffffff81050240>] ? wake_up_bit+0x40/0x40
[   73.288132]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
[   73.288138]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
[   73.288144]  [<ffffffff8104fde6>] ? kthread+0x96/0xa0
[   73.288151]  [<ffffffff8140bbf4>] ? kernel_thread_helper+0x4/0x10
[   73.288159]  [<ffffffff8104fd50>] ? kthread_worker_fn+0x120/0x120
[   73.288165]  [<ffffffff8140bbf0>] ? gs_change+0xb/0xb
[   73.290590] Rebooting in 10 seconds..

I can attach the same cache set to /dev/sde and all is ok (the same 
config I used to run the last set of benchmarks).

Regards,
Brad

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]     ` <4ED47771.9030309-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
@ 2011-11-29  6:31       ` Kent Overstreet
       [not found]         ` <20111129063126.GA14194-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
  2011-11-29  9:16       ` Kent Overstreet
  1 sibling, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-11-29  6:31 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 29, 2011 at 02:10:57PM +0800, Brad Campbell wrote:
> On 21/11/11 18:14, Kent Overstreet wrote:
> >I just pushed a new version, and it's only been lightly tested but
> >assuming I haven't screwed anything up it should work on
> >md/dm/rados/iscsi/etc. block devices.
> >
> 
> Great stuff! A couple of nitpicks while I get things set up to run
> some tests..
> 
> Documentation/bcache.txt states :
> <--------->
> To register your bcache devices automatically, you could add something like
> this to an init script:
>   echo /dev/sd* > /sys/fs/bcache/register_quiet
> 
> It'll look for bcache superblocks and ignore everything that doesn't
> have one.
> <--------->
> 
> However this never works for me. It bombs out on the first passed parameter
> 
> root@test:~/bin# echo /dev/sd* /dev/md* > /sys/fs/bcache/register_quiet
> bash: echo: write error: Invalid argument

Whoops, good catch - yeah, the documentation is completely wrong. Fixing
it now.

> I need to use this :
> for i in /dev/sd? /dev/md* ; do [ -n "`/sbin/probe-bcache $i`" ] &&
> echo $i > /sys/fs/bcache/register_quiet ; done
> 
> Now, it does not actually need the test in there, however that stops
> it spewing "write error: Invalid argument" onto the console when you
> echo a device that does not have a bcache superblock.
> 
> It does NOT like you accidentally trying to register a device twice :

Eesh. That's annoying.

I suppose really the bcache symlink in the /sys/block/bcacheN directory
is incorrect - without that stacking _ought_ to work and I'm not sure
it's possible to reliably detect stacking anyways.

Very annoying though - that symlink is very handy. Argh.

hm. maybe I could check the make_request_fn to detect stacking and
prevent it that way... 

No other issues, I take it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]         ` <20111129063126.GA14194-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
@ 2011-11-29  7:31           ` Brad Campbell
       [not found]             ` <4ED48A64.4080406-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Brad Campbell @ 2011-11-29  7:31 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 29/11/11 14:31, Kent Overstreet wrote:


>> I need to use this :
>> for i in /dev/sd? /dev/md* ; do [ -n "`/sbin/probe-bcache $i`" ]&&
>> echo $i>  /sys/fs/bcache/register_quiet ; done
>>
>> Now, it does not actually need the test in there, however that stops
>> it spewing "write error: Invalid argument" onto the console when you
>> echo a device that does not have a bcache superblock.
>>
>> It does NOT like you accidentally trying to register a device twice :
>
> Eesh. That's annoying.
>
> I suppose really the bcache symlink in the /sys/block/bcacheN directory
> is incorrect - without that stacking _ought_ to work and I'm not sure
> it's possible to reliably detect stacking anyways.

I'm not sure that stacking is the issue. I simply did

echo /dev/md10 > /sys/fs/bcache/register
echo /dev/md10 > /sys/fs/bcache/register

at that point it all came crashing down. I'd have thought simply 
detecting that a particular device was already registered would solve 
the problem.

> Very annoying though - that symlink is very handy. Argh.
>
> hm. maybe I could check the make_request_fn to detect stacking and
> prevent it that way...
>
> No other issues, I take it?

Well, I had intended to run some tests with it stacked on top of md, but 
as I pointed out in the last oops in my prior mail, every time I try and 
attach the cache set to /dev/md10 the machine panics, so I've not really 
progressed to actually trying things out. I figured re-running the tests 
I'd already run with it stacked on a single drive was pretty pointless.

Regards,
Brad

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]             ` <4ED48A64.4080406-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
@ 2011-11-29  7:54               ` Kent Overstreet
       [not found]                 ` <20111129075440.GB14194-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-11-29  7:54 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 29, 2011 at 03:31:48PM +0800, Brad Campbell wrote:
> I'm not sure that stacking is the issue. I simply did
> 
> echo /dev/md10 > /sys/fs/bcache/register
> echo /dev/md10 > /sys/fs/bcache/register
> 
> at that point it all came crashing down. I'd have thought simply
> detecting that a particular device was already registered would
> solve the problem.

Ok, that's weird. It shouldn't be able to register the second time
because the first register opens it exclusively, and the second open
will fail with -EBUSY.

> Well, I had intended to run some tests with it stacked on top of md,
> but as I pointed out in the last oops in my prior mail, every time I
> try and attach the cache set to /dev/md10 the machine panics, so
> I've not really progressed to actually trying things out. I figured
> re-running the tests I'd already run with it stacked on a single
> drive was pretty pointless.

Bah, I suck at reading comprehension tonight, didn't see the second
oops.

That one looks strange, I haven't seen an oops there before. Which raid
type are you using, and which driver? Hopefully it's related to the raid
type and not the driver...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                 ` <20111129075440.GB14194-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
@ 2011-11-29  8:30                   ` Brad Campbell
       [not found]                     ` <4ED4981E.6040501-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Brad Campbell @ 2011-11-29  8:30 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 29/11/11 15:54, Kent Overstreet wrote:
> On Tue, Nov 29, 2011 at 03:31:48PM +0800, Brad Campbell wrote:
>> I'm not sure that stacking is the issue. I simply did
>>
>> echo /dev/md10>  /sys/fs/bcache/register
>> echo /dev/md10>  /sys/fs/bcache/register
>>
>> at that point it all came crashing down. I'd have thought simply
>> detecting that a particular device was already registered would
>> solve the problem.
>
> Ok, that's weird. It shouldn't be able to register the second time
> because the first register opens it exclusively, and the second open
> will fail with -EBUSY.

Can reproduce it at will here.

Just to prove it wasn't a fluke and is not related to md10 :

[ 7991.108197] ------------[ cut here ]------------
[ 7991.108238] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0xb9/0xf0()
[ 7991.108256] Hardware name: To Be Filled By O.E.M.
[ 7991.108272] sysfs: cannot create duplicate filename 
'/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host5/target5:0:13/5:0:13:0/block/sde/bcache'
[ 7991.108299] Modules linked in: xt_state ipt_REJECT xt_CHECKSUM 
iptable_mangle nfs ipt_MASQUERADE xt_tcpudp iptable_filter iptable_nat 
nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables 
deflate zlib_deflate des_generic cbc ecb crypto_blkcipher sha1_generic 
md5 hmac crypto_hash cryptomgr aead crypto_algapi af_key fuse w83627ehf 
hwmon_vid netconsole configfs vhost_net powernow_k8 mperf kvm_amd kvm 
xhci_hcd k10temp i2c_piix4 ohci_hcd ehci_hcd usbcore ahci libahci atl1c 
megaraid_sas [last unloaded: scsi_wait_scan]
[ 7991.108667] Pid: 16579, comm: bash Not tainted 3.1.0-g143cdea #1
[ 7991.108684] Call Trace:
[ 7991.108705]  [<ffffffff81034eeb>] ? warn_slowpath_common+0x7b/0xc0
[ 7991.108726]  [<ffffffff81034fe5>] ? warn_slowpath_fmt+0x45/0x50
[ 7991.108747]  [<ffffffff811210c9>] ? sysfs_add_one+0xb9/0xf0
[ 7991.108767]  [<ffffffff81121b69>] ? create_dir+0x79/0xe0
[ 7991.108788]  [<ffffffff81121c42>] ? sysfs_create_dir+0x72/0xb0
[ 7991.108807]  [<ffffffff811dc18f>] ? kobject_add_internal+0xaf/0x1e0
[ 7991.108826]  [<ffffffff811dc4c6>] ? kobject_add+0x46/0x70
[ 7991.108847]  [<ffffffff810995b0>] ? bdi_init+0x170/0x1c0
[ 7991.108864]  [<ffffffff811dbd3d>] ? kobject_init+0x2d/0xb0
[ 7991.108885]  [<ffffffff812a435d>] ? register_bcache+0x72d/0xac0
[ 7991.108908]  [<ffffffff81120222>] ? sysfs_write_file+0xd2/0x160
[ 7991.108928]  [<ffffffff810c7348>] ? vfs_write+0xc8/0x190
[ 7991.108946]  [<ffffffff810c750e>] ? sys_write+0x4e/0x90
[ 7991.108966]  [<ffffffff8140a87b>] ? system_call_fastpath+0x16/0x1b
[ 7991.108984] ---[ end trace 88a7af6bca09c44d ]---
[ 7991.109002] kobject_add_internal failed for bcache with -EEXIST, 
don't try to register things with the same name in the same directory.
[ 7991.109030] Pid: 16579, comm: bash Tainted: G        W 
3.1.0-g143cdea #1
[ 7991.109048] Call Trace:
[ 7991.109064]  [<ffffffff811dc22a>] ? kobject_add_internal+0x14a/0x1e0
[ 7991.109083]  [<ffffffff811dc4c6>] ? kobject_add+0x46/0x70
[ 7991.109103]  [<ffffffff810995b0>] ? bdi_init+0x170/0x1c0
[ 7991.109122]  [<ffffffff811dbd3d>] ? kobject_init+0x2d/0xb0
[ 7991.109142]  [<ffffffff812a435d>] ? register_bcache+0x72d/0xac0
[ 7991.109163]  [<ffffffff81120222>] ? sysfs_write_file+0xd2/0x160
[ 7991.109182]  [<ffffffff810c7348>] ? vfs_write+0xc8/0x190
[ 7991.109201]  [<ffffffff810c750e>] ? sys_write+0x4e/0x90
[ 7991.109220]  [<ffffffff8140a87b>] ? system_call_fastpath+0x16/0x1b
[ 7991.109259] bcache: Device sde unregistered


>> Well, I had intended to run some tests with it stacked on top of md,
>> but as I pointed out in the last oops in my prior mail, every time I
>> try and attach the cache set to /dev/md10 the machine panics, so
>> I've not really progressed to actually trying things out. I figured
>> re-running the tests I'd already run with it stacked on a single
>> drive was pretty pointless.
>
> Bah, I suck at reading comprehension tonight, didn't see the second
> oops.
>
> That one looks strange, I haven't seen an oops there before. Which raid
> type are you using, and which driver? Hopefully it's related to the raid
> type and not the driver...


I doubt its the driver as the RAID is on the same card as the single 
drive I tested with last time.

brad@test:~$ sudo mdadm --detail /dev/md10
/dev/md10:
         Version : 1.2
   Creation Time : Sat Oct 15 11:16:29 2011
      Raid Level : raid10
      Array Size : 490231808 (467.52 GiB 502.00 GB)
   Used Dev Size : 245115904 (233.76 GiB 251.00 GB)
    Raid Devices : 4
   Total Devices : 4
     Persistence : Superblock is persistent

   Intent Bitmap : Internal

     Update Time : Tue Nov 29 14:08:52 2011
           State : active, resyncing
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0

          Layout : far=2
      Chunk Size : 512K

  Rebuild Status : 0% complete

            Name : test:10  (local to host test)
            UUID : 3c5cbbdb:c1ea4d76:8ddc8037:973dbdc5
          Events : 54

     Number   Major   Minor   RaidDevice State
        0       8       16        0      active sync   /dev/sdb
        1       8       48        1      active sync   /dev/sdd
        2       8       32        2      active sync   /dev/sdc
        3       8        0        3      active sync   /dev/sda



Driver is

[    4.413958] megasas: 00.00.05.40-rc1 Tue. Jul. 26 17:00:00 PDT 2011
[    4.414015] megasas: 0x1000:0x0073:0x1014:0x03b1: bus 2:slot 0:func 0
[    4.414082] megaraid_sas 0000:02:00.0: PCI INT A -> GSI 18 (level, 
low) -> IRQ 18

Standard LSI megaraid SAS card.

I can probably try some other RAID levels this evening if it would help. 
I trashed the RAID recently anyway so I need to re-build it from scratch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                     ` <4ED4981E.6040501-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
@ 2011-11-29  8:45                       ` Kent Overstreet
       [not found]                         ` <20111129084544.GA16225-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
  2011-12-06  3:45                       ` Kent Overstreet
  1 sibling, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-11-29  8:45 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Nov 29, 2011 at 04:30:22PM +0800, Brad Campbell wrote:
> On 29/11/11 15:54, Kent Overstreet wrote:
> >Ok, that's weird. It shouldn't be able to register the second time
> >because the first register opens it exclusively, and the second open
> >will fail with -EBUSY.
> 
> Can reproduce it at will here.

I don't doubt you, just surprised. I'll try it out first thing tomorrow.
No reason I shouldn't be able to reproduce it.

> I doubt its the driver as the RAID is on the same card as the single
> drive I tested with last time.

Ok, raid10 is the one I didn't test - I tried raid0 and 6 and those
worked, but each raid layer has its own code to process bios so it
sounds like there's a corner case we're tripping.

> I can probably try some other RAID levels this evening if it would
> help. I trashed the RAID recently anyway so I need to re-build it
> from scratch.

Let me see if I can reproduce it first, I'll try it first thing in the
morning. Hopefully it'll be something easy.

Were you running in passthrough mode or was caching on when you got the
oops?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]     ` <4ED47771.9030309-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
  2011-11-29  6:31       ` Kent Overstreet
@ 2011-11-29  9:16       ` Kent Overstreet
  1 sibling, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-11-29  9:16 UTC (permalink / raw)
  To: tejun-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Brad Campbell

Hey Tejun, mind taking a glance? I think I might be doing something dumb
here, but I'm not quite sure what and I haven't dealt much with struct
request:

(This is bcache on 3.1 with a raid10)

scsi_setup_fs_cmnd() is triggering BUG_ON(!req->nr_phys_segments). 

I'm not finding the code where req->nr_phys_segments is set right now,
but I'm guessing what's happening is it's getting set incorrectly
because the bio(s) in the request have it set to 0, not -1. 

If that's the case, then the bug is that bio_split_front()
(bcache/util.c) creates a new bio but doesn't initalize
bio->bi_phys_segments, so just setting it to -1 so it gets recalculated
later should fix it...

That sound right to you?

On Tue, Nov 29, 2011 at 02:10:57PM +0800, Brad Campbell wrote:
> This happens immediately I try and attach a cache set to /dev/md10 :
> 
> [   73.287556] md: resync of RAID array md10
> [   73.287614] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [   73.287620] ------------[ cut here ]------------
> [   73.287628] kernel BUG at drivers/scsi/scsi_lib.c:1152!
> [   73.287633] invalid opcode: 0000 [#1] SMP
> [   73.287638] CPU 1
> [   73.287641] Modules linked in: xt_state ipt_REJECT xt_CHECKSUM
> iptable_mangle nfs ipt_MASQUERADE xt_tcpudp iptable_filter
> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> ip_tables x_tables deflate zlib_deflate des_generic cbc ecb
> crypto_blkcipher sha1_generic md5 hmac crypto_hash cryptomgr aead
> crypto_algapi af_key fuse w83627ehf hwmon_vid netconsole configfs
> vhost_net powernow_k8 mperf kvm_amd kvm xhci_hcd k10temp i2c_piix4
> ohci_hcd ehci_hcd usbcore ahci atl1c libahci megaraid_sas [last
> unloaded: scsi_wait_scan]
> [   73.287694]
> [   73.287700] Pid: 1428, comm: md10_raid10 Not tainted
> 3.1.0-g143cdea #1 To Be Filled By O.E.M. To Be Filled By
> O.E.M./890GX Extreme4 R2.0
> [   73.287711] RIP: 0010:[<ffffffff812adbfe>]  [<ffffffff812adbfe>]
> scsi_setup_fs_cmnd+0xae/0xf0
> [   73.287726] RSP: 0018:ffff88041bb73be0  EFLAGS: 00010046
> [   73.287731] RAX: 0000000000000000 RBX: ffff88041ba5c560 RCX:
> 0000000000001000
> [   73.287736] RDX: 0000000000000000 RSI: ffff88041ba5c560 RDI:
> ffff88041bbf8000
> [   73.287740] RBP: ffff88041bbf8000 R08: 0000000000000000 R09:
> 0000000000000001
> [   73.287745] R10: 4080000000000000 R11: dead000000100100 R12:
> ffff88041bbf8000
> [   73.287750] R13: 0000000000000808 R14: ffff88041bbf8048 R15:
> ffff88041c029400
> [   73.287756] FS:  00007fa5803717c0(0000) GS:ffff88042fc40000(0000)
> knlGS:0000000000000000
> [   73.287761] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   73.287766] CR2: 0000000001b7c1b8 CR3: 0000000418a24000 CR4:
> 00000000000006e0
> [   73.287770] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [   73.287774] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [   73.287780] Process md10_raid10 (pid: 1428, threadinfo
> ffff88041bb72000, task ffff88041c760640)
> [   73.287784] Stack:
> [   73.287787]  0000000000000000 ffff88041ba5c560 ffff88041c603c30
> ffffffff812b86fc
> [   73.287794]  01ff88041bbf8800 ffff880400000001 ffff880400001000
> 0000000000000000
> [   73.287800]  ffff88041ba5c560 ffff88041ba5c560 ffff88041ba5c560
> ffff88041c603c30
> [   73.287807] Call Trace:
> [   73.287816]  [<ffffffff812b86fc>] ? sd_prep_fn+0x15c/0xa50
> [   73.287825]  [<ffffffff811c8be4>] ? blk_peek_request+0xb4/0x1d0
> [   73.287832]  [<ffffffff812ad120>] ? scsi_request_fn+0x50/0x4a0
> [   73.287840]  [<ffffffff811c9598>] ? blk_flush_plug_list+0x188/0x210
> [   73.287847]  [<ffffffff811c962b>] ? blk_finish_plug+0xb/0x30
> [   73.287854]  [<ffffffff812fae78>] ? raid10d+0x908/0xb50
> [   73.287862]  [<ffffffff81041183>] ? lock_timer_base+0x33/0x70
> [   73.287870]  [<ffffffff814089c5>] ? schedule_timeout+0x1c5/0x230
> [   73.287878]  [<ffffffff8130ec8f>] ? md_thread+0x10f/0x140
> [   73.287886]  [<ffffffff81050240>] ? wake_up_bit+0x40/0x40
> [   73.287892]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
> [   73.287898]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
> [   73.287905]  [<ffffffff8104fde6>] ? kthread+0x96/0xa0
> [   73.287912]  [<ffffffff8140bbf4>] ? kernel_thread_helper+0x4/0x10
> [   73.287920]  [<ffffffff8104fd50>] ? kthread_worker_fn+0x120/0x120
> [   73.287926]  [<ffffffff8140bbf0>] ? gs_change+0xb/0xb
> [   73.287929] Code: 80 00 00 00 00 48 83 c4 08 5b 5d c3 90 48 89 ef
> be 20 00 00 00 e8 73 a2 ff ff 48 85 c0 48 89 c7 74 d7 48 89 83 d8 00
> 00 00 eb 8d <0f> 0b eb fe 48 8b 00 48 85 c0 0f 84 67 ff ff ff 48 8b
> 40 50 48
> [   73.287969] RIP  [<ffffffff812adbfe>] scsi_setup_fs_cmnd+0xae/0xf0
> [   73.287977]  RSP <ffff88041bb73be0>
> [   73.287982] ---[ end trace 4ce4e575167cc0ff ]---
> [   73.287986] Kernel panic - not syncing: Fatal exception
> [   73.287993] Pid: 1428, comm: md10_raid10 Tainted: G      D
> 3.1.0-g143cdea #1
> [   73.287998] Call Trace:
> [   73.288005]  [<ffffffff81407679>] ? panic+0x92/0x193
> [   73.288012]  [<ffffffff81035451>] ? kmsg_dump+0x41/0xf0
> [   73.288021]  [<ffffffff8100504d>] ? oops_end+0x8d/0xa0
> [   73.288028]  [<ffffffff81002e34>] ? do_invalid_op+0x84/0xa0
> [   73.288035]  [<ffffffff812adbfe>] ? scsi_setup_fs_cmnd+0xae/0xf0
> [   73.288044]  [<ffffffff811d698e>] ? cfq_set_request+0x15e/0x3b0
> [   73.288050]  [<ffffffff8140ba75>] ? invalid_op+0x15/0x20
> [   73.288058]  [<ffffffff812adbfe>] ? scsi_setup_fs_cmnd+0xae/0xf0
> [   73.288064]  [<ffffffff812b86fc>] ? sd_prep_fn+0x15c/0xa50
> [   73.288071]  [<ffffffff811c8be4>] ? blk_peek_request+0xb4/0x1d0
> [   73.288078]  [<ffffffff812ad120>] ? scsi_request_fn+0x50/0x4a0
> [   73.288085]  [<ffffffff811c9598>] ? blk_flush_plug_list+0x188/0x210
> [   73.288092]  [<ffffffff811c962b>] ? blk_finish_plug+0xb/0x30
> [   73.288098]  [<ffffffff812fae78>] ? raid10d+0x908/0xb50
> [   73.288104]  [<ffffffff81041183>] ? lock_timer_base+0x33/0x70
> [   73.288112]  [<ffffffff814089c5>] ? schedule_timeout+0x1c5/0x230
> [   73.288119]  [<ffffffff8130ec8f>] ? md_thread+0x10f/0x140
> [   73.288126]  [<ffffffff81050240>] ? wake_up_bit+0x40/0x40
> [   73.288132]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
> [   73.288138]  [<ffffffff8130eb80>] ? md_register_thread+0x100/0x100
> [   73.288144]  [<ffffffff8104fde6>] ? kthread+0x96/0xa0
> [   73.288151]  [<ffffffff8140bbf4>] ? kernel_thread_helper+0x4/0x10
> [   73.288159]  [<ffffffff8104fd50>] ? kthread_worker_fn+0x120/0x120
> [   73.288165]  [<ffffffff8140bbf0>] ? gs_change+0xb/0xb
> [   73.290590] Rebooting in 10 seconds..
> 
> I can attach the same cache set to /dev/sde and all is ok (the same
> config I used to run the last set of benchmarks).
> 
> Regards,
> Brad
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                         ` <20111129084544.GA16225-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
@ 2011-12-03  6:27                           ` Kent Overstreet
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Overstreet @ 2011-12-03  6:27 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Just reproduced it - raid10 did the trick.

I'll try and debug it this weekend.

On Tue, Nov 29, 2011 at 12:45 AM, Kent Overstreet
<koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, Nov 29, 2011 at 04:30:22PM +0800, Brad Campbell wrote:
>> On 29/11/11 15:54, Kent Overstreet wrote:
>> >Ok, that's weird. It shouldn't be able to register the second time
>> >because the first register opens it exclusively, and the second open
>> >will fail with -EBUSY.
>>
>> Can reproduce it at will here.
>
> I don't doubt you, just surprised. I'll try it out first thing tomorrow.
> No reason I shouldn't be able to reproduce it.
>
>> I doubt its the driver as the RAID is on the same card as the single
>> drive I tested with last time.
>
> Ok, raid10 is the one I didn't test - I tried raid0 and 6 and those
> worked, but each raid layer has its own code to process bios so it
> sounds like there's a corner case we're tripping.
>
>> I can probably try some other RAID levels this evening if it would
>> help. I trashed the RAID recently anyway so I need to re-build it
>> from scratch.
>
> Let me see if I can reproduce it first, I'll try it first thing in the
> morning. Hopefully it'll be something easy.
>
> Were you running in passthrough mode or was caching on when you got the
> oops?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                     ` <4ED4981E.6040501-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
  2011-11-29  8:45                       ` Kent Overstreet
@ 2011-12-06  3:45                       ` Kent Overstreet
  2011-12-06  4:02                         ` Kent Overstreet
  1 sibling, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06  3:45 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Fixed the raid10 issue - it's working for me, and the code's up. Haven't
fixed the duplicate registration issue yet, though - I'll take a look at
that next...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
  2011-12-06  3:45                       ` Kent Overstreet
@ 2011-12-06  4:02                         ` Kent Overstreet
       [not found]                           ` <CAC7rs0saVh=a587mNCTCJwbVi7-u7kRuXu6-pZuJ6CRs1AACsw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06  4:02 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Argh. I spoke too soon, it just exploded after the second bonnie run.
The block layer is badly in need of some cleanup... *grumble*

On Mon, Dec 5, 2011 at 7:45 PM, Kent Overstreet
<kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Fixed the raid10 issue - it's working for me, and the code's up. Haven't
> fixed the duplicate registration issue yet, though - I'll take a look at
> that next...
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                           ` <CAC7rs0saVh=a587mNCTCJwbVi7-u7kRuXu6-pZuJ6CRs1AACsw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-06  4:41                             ` Kent Overstreet
       [not found]                               ` <CAC7rs0ttY4Ama4v7yTepVTc65TyCo3+T4aPFoHJW1CwA8mDuUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06  4:41 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

So, it should work in writethrough mode. I discovered a really
annoying issue with background writeback that's going to take me a bit
to decide how to solve...

On Mon, Dec 5, 2011 at 8:02 PM, Kent Overstreet
<kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Argh. I spoke too soon, it just exploded after the second bonnie run.
> The block layer is badly in need of some cleanup... *grumble*
>
> On Mon, Dec 5, 2011 at 7:45 PM, Kent Overstreet
> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Fixed the raid10 issue - it's working for me, and the code's up. Haven't
>> fixed the duplicate registration issue yet, though - I'll take a look at
>> that next...
>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                               ` <CAC7rs0ttY4Ama4v7yTepVTc65TyCo3+T4aPFoHJW1CwA8mDuUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-06  6:01                                 ` Kent Overstreet
       [not found]                                   ` <CAH+dOxLW71YKpC1YL61osFq6oDVWxoj4ajLht3EqMUiWTYogTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-06  6:01 UTC (permalink / raw)
  To: Brad Campbell; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Ok, it looks like as long as your cache's bucket size is not greater
than 1 mb everything should work, including writeback. Look forward to
hearing if it works for you :)

On Mon, Dec 5, 2011 at 8:41 PM, Kent Overstreet
<kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> So, it should work in writethrough mode. I discovered a really
> annoying issue with background writeback that's going to take me a bit
> to decide how to solve...
>
> On Mon, Dec 5, 2011 at 8:02 PM, Kent Overstreet
> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Argh. I spoke too soon, it just exploded after the second bonnie run.
>> The block layer is badly in need of some cleanup... *grumble*
>>
>> On Mon, Dec 5, 2011 at 7:45 PM, Kent Overstreet
>> <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Fixed the raid10 issue - it's working for me, and the code's up. Haven't
>>> fixed the duplicate registration issue yet, though - I'll take a look at
>>> that next...
>>>
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                                   ` <CAH+dOxLW71YKpC1YL61osFq6oDVWxoj4ajLht3EqMUiWTYogTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-09  2:18                                     ` Brad Campbell
       [not found]                                       ` <4EE16FED.5080809-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Brad Campbell @ 2011-12-09  2:18 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 06/12/11 14:01, Kent Overstreet wrote:
> Ok, it looks like as long as your cache's bucket size is not greater
> than 1 mb everything should work, including writeback. Look forward to
> hearing if it works for you :)

I saw you re-based the repository so I cloned fresh this morning and get 
this compile error :

brad@test:/raid10/src/linux-bcache$ make
   CHK     include/linux/version.h
   CHK     include/generated/utsrelease.h
   CALL    scripts/checksyscalls.sh
   CHK     include/generated/compile.h
   CHK     kernel/config_data.h
   CC      drivers/block/bcache/super.o
drivers/block/bcache/super.c: In function ‘lioctl_dev’:
drivers/block/bcache/super.c:1209: error: ‘const struct 
block_device_operations’ has no member named ‘locked_ioctl’
drivers/block/bcache/super.c: At top level:
drivers/block/bcache/super.c:1215: error: unknown field ‘locked_ioctl’ 
specified in initializer
make[3]: *** [drivers/block/bcache/super.o] Error 1
make[2]: *** [drivers/block/bcache] Error 2
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2

Regards,
Brad

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                                       ` <4EE16FED.5080809-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
@ 2011-12-09 10:01                                         ` Kent Overstreet
       [not found]                                           ` <CAC7rs0tnhwAhQF53nTnHsdnnFOKpbG1BvAE5EcbbvsFWR-_6RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Overstreet @ 2011-12-09 10:01 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Bah, that's what I get for being lazy. Should work now.

Also, there's no need to re clone when I rebase - just do git reset
--hard origin/bcache

On Thu, Dec 8, 2011 at 6:18 PM, Brad Campbell <brad-+nnirC7rrGZibQn6LdNjmg@public.gmane.org> wrote:
> On 06/12/11 14:01, Kent Overstreet wrote:
>>
>> Ok, it looks like as long as your cache's bucket size is not greater
>> than 1 mb everything should work, including writeback. Look forward to
>> hearing if it works for you :)
>
>
> I saw you re-based the repository so I cloned fresh this morning and get
> this compile error :
>
> brad@test:/raid10/src/linux-bcache$ make
>  CHK     include/linux/version.h
>  CHK     include/generated/utsrelease.h
>  CALL    scripts/checksyscalls.sh
>  CHK     include/generated/compile.h
>  CHK     kernel/config_data.h
>  CC      drivers/block/bcache/super.o
> drivers/block/bcache/super.c: In function ‘lioctl_dev’:
> drivers/block/bcache/super.c:1209: error: ‘const struct
> block_device_operations’ has no member named ‘locked_ioctl’
> drivers/block/bcache/super.c: At top level:
> drivers/block/bcache/super.c:1215: error: unknown field ‘locked_ioctl’
> specified in initializer
> make[3]: *** [drivers/block/bcache/super.o] Error 1
> make[2]: *** [drivers/block/bcache] Error 2
> make[1]: *** [drivers/block] Error 2
> make: *** [drivers] Error 2
>
> Regards,
> Brad
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New version up with fix for md and other block devices
       [not found]                                           ` <CAC7rs0tnhwAhQF53nTnHsdnnFOKpbG1BvAE5EcbbvsFWR-_6RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-09 13:00                                             ` Brad Campbell
  0 siblings, 0 replies; 17+ messages in thread
From: Brad Campbell @ 2011-12-09 13:00 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Kent Overstreet, linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 09/12/11 18:01, Kent Overstreet wrote:
> Bah, that's what I get for being lazy. Should work now.
>
> Also, there's no need to re clone when I rebase - just do git reset
> --hard origin/bcache
>

Will do. It's never an extensive process anyway as I always --reference 
a local git tree, so it's only bcache that gets pulled.

I'll compile it up in the morning and run some tests.

Regards,
Brad

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-12-09 13:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-21 10:14 New version up with fix for md and other block devices Kent Overstreet
     [not found] ` <CAOzFzEjdhWtS9Q538+rM6LJm0ncx_MZg++3TCag3jr68F2=1uA@mail.gmail.com>
     [not found]   ` <CAOzFzEjdhWtS9Q538+rM6LJm0ncx_MZg++3TCag3jr68F2=1uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-21 10:26     ` Kent Overstreet
     [not found] ` <20111121101402.GA17787-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2011-11-29  6:10   ` Brad Campbell
     [not found]     ` <4ED47771.9030309-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
2011-11-29  6:31       ` Kent Overstreet
     [not found]         ` <20111129063126.GA14194-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2011-11-29  7:31           ` Brad Campbell
     [not found]             ` <4ED48A64.4080406-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
2011-11-29  7:54               ` Kent Overstreet
     [not found]                 ` <20111129075440.GB14194-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2011-11-29  8:30                   ` Brad Campbell
     [not found]                     ` <4ED4981E.6040501-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
2011-11-29  8:45                       ` Kent Overstreet
     [not found]                         ` <20111129084544.GA16225-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2011-12-03  6:27                           ` Kent Overstreet
2011-12-06  3:45                       ` Kent Overstreet
2011-12-06  4:02                         ` Kent Overstreet
     [not found]                           ` <CAC7rs0saVh=a587mNCTCJwbVi7-u7kRuXu6-pZuJ6CRs1AACsw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06  4:41                             ` Kent Overstreet
     [not found]                               ` <CAC7rs0ttY4Ama4v7yTepVTc65TyCo3+T4aPFoHJW1CwA8mDuUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06  6:01                                 ` Kent Overstreet
     [not found]                                   ` <CAH+dOxLW71YKpC1YL61osFq6oDVWxoj4ajLht3EqMUiWTYogTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09  2:18                                     ` Brad Campbell
     [not found]                                       ` <4EE16FED.5080809-+nnirC7rrGZibQn6LdNjmg@public.gmane.org>
2011-12-09 10:01                                         ` Kent Overstreet
     [not found]                                           ` <CAC7rs0tnhwAhQF53nTnHsdnnFOKpbG1BvAE5EcbbvsFWR-_6RA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-09 13:00                                             ` Brad Campbell
2011-11-29  9:16       ` Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.