public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [Topic] Bcache
@ 2012-03-14 13:32 Kent Overstreet
  2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal
  0 siblings, 1 reply; 21+ messages in thread
From: Kent Overstreet @ 2012-03-14 13:32 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-scsi, nauman

I'm already registered to attend, but would it be too late in the
process to give a talk? I'd like to give a short talk about bcache, what
it does and where it's going (more than just caching).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 13:32 [Topic] Bcache Kent Overstreet
@ 2012-03-14 15:53 ` Vivek Goyal
  2012-03-14 17:24   ` Kent Overstreet
  2012-03-14 18:12   ` chetan loke
  0 siblings, 2 replies; 21+ messages in thread
From: Vivek Goyal @ 2012-03-14 15:53 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote:
> I'm already registered to attend, but would it be too late in the
> process to give a talk? I'd like to give a short talk about bcache, what
> it does and where it's going (more than just caching).

[CCing dm-devel list]

I am curious if you considered writing a device mapper driver for this? If
yes, why that is not a good choice. It seems to be stacked device and device
mapper should be good at that. All the configuration through sysfs seems
little odd to me.

On a side note, I was playing with bcache a bit. I tried to register the
cache device and it crashes. (I guess I should post this on relevant mailing
list).

# echo /dev/sdc > /sys/fs/bcache/register

[ 6758.314093] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-bcache+ #2
Hewlett-Packard HP xw6600 Workstation/0A9Ch
[ 6758.314093] RIP: 0010:[<ffffffff8146625b>]  [<ffffffff8146625b>]
closure_put+0x5b/0xe0
[ 6758.314093] RSP: 0018:ffff88013fc83c60  EFLAGS: 00010246
[ 6758.314093] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801281204a0 RCX:
0000000000000000
[ 6758.314093] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
ffff88013906ec48
[ 6758.314093] RBP: ffff88013fc83c60 R08: 0000000000000000 R09:
0000000000000001
[ 6758.314093] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ 6758.314093] R13: ffff880130b58560 R14: 0000000000080000 R15:
0000000000000000
[ 6758.314093] FS:  0000000000000000(0000) GS:ffff88013fc80000(0000)
knlGS:0000000000000000
[ 6758.314093] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6758.314093] CR2: 00007f9becec7000 CR3: 0000000137fe0000 CR4:
00000000000006e0
[ 6758.314093] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 6758.314093] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 6758.314093] Process kworker/0:1 (pid: 0, threadinfo ffff88013a44a000,
task ffff88013a458000)
[ 6758.314093] Stack:
[ 6758.314093]  ffff88013fc83c80 ffffffff8145ee6d ffffffff00000000
ffff8801281204a0
[ 6758.314093]  ffff88013fc83c90 ffffffff81173a9d ffff88013fc83cc0
ffffffff812d15d3
[ 6758.314093]  ffff88013a44a000 0000000000000000 ffff8801281204a0
0000000000080000
[ 6758.314093] Call Trace:
[ 6758.314093]  <IRQ> 
[ 6758.314093]  [<ffffffff8145ee6d>] uuid_endio+0x3d/0x50
[ 6758.314093]  [<ffffffff81173a9d>] bio_endio+0x1d/0x40
[ 6758.314093]  [<ffffffff812d15d3>] req_bio_endio+0x83/0xc0
[ 6758.314093]  [<ffffffff812d4f71>] blk_update_request+0x101/0x5c0
[ 6758.314093]  [<ffffffff812d51a2>] ? blk_update_request+0x332/0x5c0
[ 6758.314093]  [<ffffffff812d5461>] blk_update_bidi_request+0x31/0x90
[ 6758.314093]  [<ffffffff812d54ec>] blk_end_bidi_request+0x2c/0x80
[ 6758.314093]  [<ffffffff812d5580>] blk_end_request+0x10/0x20
[ 6758.314093]  [<ffffffff81471b7c>] scsi_io_completion+0x9c/0x5f0
[ 6758.314093]  [<ffffffff81468940>] scsi_finish_command+0xb0/0xe0
[ 6758.314093]  [<ffffffff81471965>] scsi_softirq_done+0xa5/0x140
[ 6758.314093]  [<ffffffff812db70b>] blk_done_softirq+0x7b/0x90
[ 6758.314093]  [<ffffffff8104fc65>] __do_softirq+0xc5/0x3a0
[ 6758.314093]  [<ffffffff817f6dac>] call_softirq+0x1c/0x30
[ 6758.314093]  [<ffffffff8100419d>] do_softirq+0x8d/0xc0
[ 6758.314093]  [<ffffffff8105027e>] irq_exit+0xae/0xe0
[ 6758.314093]  [<ffffffff817f74b3>] do_IRQ+0x63/0xe0
[ 6758.314093]  [<ffffffff817ecc30>] common_interrupt+0x70/0x70
[ 6758.314093]  <EOI> 
[ 6758.314093]  [<ffffffff8100a1f6>] ? mwait_idle+0xb6/0x470
[ 6758.314093]  [<ffffffff8100a1ed>] ? mwait_idle+0xad/0x470
[ 6758.314093]  [<ffffffff810011df>] cpu_idle+0x8f/0xd0
[ 6758.314093]  [<ffffffff817da107>] start_secondary+0x1be/0x1c2
[ 6758.314093] Code: 00 48 8b 50 48 83 e2 08 0f 85 9c 00 00 00 48 8b 50 48
83 e2 10 0f 85 8d 00 00 00 48 83 78 18 00 75 46 48 8b 40 40 48 85 c0 74 24 
[ 6758.314093]  8b 50 48 48 c1 ea 04 89 d1 89 f2 83 e1 01 f0 0f c1 50 4c
83 
[ 6758.314093] RIP  [<ffffffff8146625b>] closure_put+0x5b/0xe0
[ 6758.314093]  RSP <ffff88013fc83c60>

Thanks
Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal
@ 2012-03-14 17:24   ` Kent Overstreet
  2012-03-14 22:01     ` Bcache Mike Snitzer
  2012-03-15 19:43     ` [Lsf-pc] [Topic] Bcache Vivek Goyal
  2012-03-14 18:12   ` chetan loke
  1 sibling, 2 replies; 21+ messages in thread
From: Kent Overstreet @ 2012-03-14 17:24 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote:
>> I'm already registered to attend, but would it be too late in the
>> process to give a talk? I'd like to give a short talk about bcache, what
>> it does and where it's going (more than just caching).
>
> [CCing dm-devel list]
>
> I am curious if you considered writing a device mapper driver for this? If
> yes, why that is not a good choice. It seems to be stacked device and device
> mapper should be good at that. All the configuration through sysfs seems
> little odd to me.

Everyone asks this. Yeah, I considered it, I tried to make it work for
a couple weeks but it was far more trouble than it was worth. I'm not
opposed to someone else working on it but I'm not going to spend any
more time on it myself.
>
> On a side note, I was playing with bcache a bit. I tried to register the
> cache device and it crashes. (I guess I should post this on relevant mailing
> list).

Can you post the full log? There was a bug where if it encountered an
error during registration, it wouldn't wait for a uuid read or write
before tearing everything down - that's what your backtrace looks like
to me.

You could try the bcache-3.2-dev branch, too. I have a newer branch
with a ton of bugfixes but I'm waiting until it's seen more testing
before I post it.

>
> # echo /dev/sdc > /sys/fs/bcache/register
>
> [ 6758.314093] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-bcache+ #2
> Hewlett-Packard HP xw6600 Workstation/0A9Ch
> [ 6758.314093] RIP: 0010:[<ffffffff8146625b>]  [<ffffffff8146625b>]
> closure_put+0x5b/0xe0
> [ 6758.314093] RSP: 0018:ffff88013fc83c60  EFLAGS: 00010246
> [ 6758.314093] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801281204a0 RCX:
> 0000000000000000
> [ 6758.314093] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> ffff88013906ec48
> [ 6758.314093] RBP: ffff88013fc83c60 R08: 0000000000000000 R09:
> 0000000000000001
> [ 6758.314093] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [ 6758.314093] R13: ffff880130b58560 R14: 0000000000080000 R15:
> 0000000000000000
> [ 6758.314093] FS:  0000000000000000(0000) GS:ffff88013fc80000(0000)
> knlGS:0000000000000000
> [ 6758.314093] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 6758.314093] CR2: 00007f9becec7000 CR3: 0000000137fe0000 CR4:
> 00000000000006e0
> [ 6758.314093] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 6758.314093] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 6758.314093] Process kworker/0:1 (pid: 0, threadinfo ffff88013a44a000,
> task ffff88013a458000)
> [ 6758.314093] Stack:
> [ 6758.314093]  ffff88013fc83c80 ffffffff8145ee6d ffffffff00000000
> ffff8801281204a0
> [ 6758.314093]  ffff88013fc83c90 ffffffff81173a9d ffff88013fc83cc0
> ffffffff812d15d3
> [ 6758.314093]  ffff88013a44a000 0000000000000000 ffff8801281204a0
> 0000000000080000
> [ 6758.314093] Call Trace:
> [ 6758.314093]  <IRQ>
> [ 6758.314093]  [<ffffffff8145ee6d>] uuid_endio+0x3d/0x50
> [ 6758.314093]  [<ffffffff81173a9d>] bio_endio+0x1d/0x40
> [ 6758.314093]  [<ffffffff812d15d3>] req_bio_endio+0x83/0xc0
> [ 6758.314093]  [<ffffffff812d4f71>] blk_update_request+0x101/0x5c0
> [ 6758.314093]  [<ffffffff812d51a2>] ? blk_update_request+0x332/0x5c0
> [ 6758.314093]  [<ffffffff812d5461>] blk_update_bidi_request+0x31/0x90
> [ 6758.314093]  [<ffffffff812d54ec>] blk_end_bidi_request+0x2c/0x80
> [ 6758.314093]  [<ffffffff812d5580>] blk_end_request+0x10/0x20
> [ 6758.314093]  [<ffffffff81471b7c>] scsi_io_completion+0x9c/0x5f0
> [ 6758.314093]  [<ffffffff81468940>] scsi_finish_command+0xb0/0xe0
> [ 6758.314093]  [<ffffffff81471965>] scsi_softirq_done+0xa5/0x140
> [ 6758.314093]  [<ffffffff812db70b>] blk_done_softirq+0x7b/0x90
> [ 6758.314093]  [<ffffffff8104fc65>] __do_softirq+0xc5/0x3a0
> [ 6758.314093]  [<ffffffff817f6dac>] call_softirq+0x1c/0x30
> [ 6758.314093]  [<ffffffff8100419d>] do_softirq+0x8d/0xc0
> [ 6758.314093]  [<ffffffff8105027e>] irq_exit+0xae/0xe0
> [ 6758.314093]  [<ffffffff817f74b3>] do_IRQ+0x63/0xe0
> [ 6758.314093]  [<ffffffff817ecc30>] common_interrupt+0x70/0x70
> [ 6758.314093]  <EOI>
> [ 6758.314093]  [<ffffffff8100a1f6>] ? mwait_idle+0xb6/0x470
> [ 6758.314093]  [<ffffffff8100a1ed>] ? mwait_idle+0xad/0x470
> [ 6758.314093]  [<ffffffff810011df>] cpu_idle+0x8f/0xd0
> [ 6758.314093]  [<ffffffff817da107>] start_secondary+0x1be/0x1c2
> [ 6758.314093] Code: 00 48 8b 50 48 83 e2 08 0f 85 9c 00 00 00 48 8b 50 48
> 83 e2 10 0f 85 8d 00 00 00 48 83 78 18 00 75 46 48 8b 40 40 48 85 c0 74 24
> [ 6758.314093]  8b 50 48 48 c1 ea 04 89 d1 89 f2 83 e1 01 f0 0f c1 50 4c
> 83
> [ 6758.314093] RIP  [<ffffffff8146625b>] closure_put+0x5b/0xe0
> [ 6758.314093]  RSP <ffff88013fc83c60>
>
> Thanks
> Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal
  2012-03-14 17:24   ` Kent Overstreet
@ 2012-03-14 18:12   ` chetan loke
  2012-03-14 18:17     ` Kent Overstreet
  1 sibling, 1 reply; 21+ messages in thread
From: chetan loke @ 2012-03-14 18:12 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Kent Overstreet, lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote:
>> I'm already registered to attend, but would it be too late in the
>> process to give a talk? I'd like to give a short talk about bcache, what
>> it does and where it's going (more than just caching).
>
> [CCing dm-devel list]
>
> I am curious if you considered writing a device mapper driver for this? If
> yes, why that is not a good choice. It seems to be stacked device and device
> mapper should be good at that. All the configuration through sysfs seems
> little odd to me.

I'm not a dm guru but a quick scan at flash-cache seems like it does
what you are saying. Now, if performance isn't acceptable then hashes
can be replaced with trees and what-not. Also no one would need to
re-invent the stacking mechanism. I saw thin support(atleast
documented) for dm. Plus, no matter what cache you come up with you
may have to persist/store the meta-data associated with it. And dm
seems like the right place to abstract that.



> Vivek

Chetan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:12   ` chetan loke
@ 2012-03-14 18:17     ` Kent Overstreet
  2012-03-14 18:33       ` chetan loke
  0 siblings, 1 reply; 21+ messages in thread
From: Kent Overstreet @ 2012-03-14 18:17 UTC (permalink / raw)
  To: chetan loke; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 2:12 PM, chetan loke <loke.chetan@gmail.com> wrote:
> I'm not a dm guru but a quick scan at flash-cache seems like it does
> what you are saying. Now, if performance isn't acceptable then hashes
> can be replaced with trees and what-not. Also no one would need to
> re-invent the stacking mechanism. I saw thin support(atleast
> documented) for dm. Plus, no matter what cache you come up with you
> may have to persist/store the meta-data associated with it. And dm
> seems like the right place to abstract that.

Bcache kills flash cache on performance - bcache can do around a
million iops on 4k random reads, and beats it on real world
applications and hardware too (i.e. mysql).

I'm not aware of any real features I'm missing out on by not using dm...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:17     ` Kent Overstreet
@ 2012-03-14 18:33       ` chetan loke
  2012-03-14 18:41         ` Kent Overstreet
  2012-03-14 18:54         ` Ted Ts'o
  0 siblings, 2 replies; 21+ messages in thread
From: chetan loke @ 2012-03-14 18:33 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 2:17 PM, Kent Overstreet <koverstreet@google.com> wrote:
> On Wed, Mar 14, 2012 at 2:12 PM, chetan loke <loke.chetan@gmail.com> wrote:
>> I'm not a dm guru but a quick scan at flash-cache seems like it does
>> what you are saying. Now, if performance isn't acceptable then hashes
>> can be replaced with trees and what-not. Also no one would need to
>> re-invent the stacking mechanism. I saw thin support(atleast
>> documented) for dm. Plus, no matter what cache you come up with you
>> may have to persist/store the meta-data associated with it. And dm
>> seems like the right place to abstract that.
>
> Bcache kills flash cache on performance - bcache can do around a
> million iops on 4k random reads, and beats it on real world
> applications and hardware too (i.e. mysql).
>

Don't get too carried away with the perf numbers. re-read what I said:
 "if performance isn't acceptable then hashes can be replaced with
trees and what-not".

> I'm not aware of any real features I'm missing out on by not using dm...

But you are not explaining why dm is not the right stack. Just because
it crashed when you tried doesn't mean it's not the right place.
flash-cache works, doesn't it? flash-cache's limitation is because
it's a dm-target or because it is using hashing or something else?
There are start-ups who are doing quite great with SSD-cache+dm. So
please stop kidding yourself.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:33       ` chetan loke
@ 2012-03-14 18:41         ` Kent Overstreet
  2012-03-14 18:47           ` Christoph Hellwig
  2012-03-14 19:04           ` chetan loke
  2012-03-14 18:54         ` Ted Ts'o
  1 sibling, 2 replies; 21+ messages in thread
From: Kent Overstreet @ 2012-03-14 18:41 UTC (permalink / raw)
  To: chetan loke; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 2:33 PM, chetan loke <loke.chetan@gmail.com> wrote:
> On Wed, Mar 14, 2012 at 2:17 PM, Kent Overstreet <koverstreet@google.com> wrote:
>> On Wed, Mar 14, 2012 at 2:12 PM, chetan loke <loke.chetan@gmail.com> wrote:
>>> I'm not a dm guru but a quick scan at flash-cache seems like it does
>>> what you are saying. Now, if performance isn't acceptable then hashes
>>> can be replaced with trees and what-not. Also no one would need to
>>> re-invent the stacking mechanism. I saw thin support(atleast
>>> documented) for dm. Plus, no matter what cache you come up with you
>>> may have to persist/store the meta-data associated with it. And dm
>>> seems like the right place to abstract that.
>>
>> Bcache kills flash cache on performance - bcache can do around a
>> million iops on 4k random reads, and beats it on real world
>> applications and hardware too (i.e. mysql).
>>
>
> Don't get too carried away with the perf numbers. re-read what I said:
>  "if performance isn't acceptable then hashes can be replaced with
> trees and what-not".

Nobody's stopping you.

>> I'm not aware of any real features I'm missing out on by not using dm...
>
> But you are not explaining why dm is not the right stack. Just because
> it crashed when you tried doesn't mean it's not the right place.
> flash-cache works, doesn't it? flash-cache's limitation is because
> it's a dm-target or because it is using hashing or something else?
> There are start-ups who are doing quite great with SSD-cache+dm. So
> please stop kidding yourself.

If you want me to implement bcache differently, shouldn't you explain
why? I'm not sure why I _have_ to justify my decisions to you.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:41         ` Kent Overstreet
@ 2012-03-14 18:47           ` Christoph Hellwig
  2012-03-14 19:04           ` chetan loke
  1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2012-03-14 18:47 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: chetan loke, nauman, dm-devel, lsf-pc, linux-scsi, Vivek Goyal

On Wed, Mar 14, 2012 at 02:41:35PM -0400, Kent Overstreet wrote:
> If you want me to implement bcache differently, shouldn't you explain
> why? I'm not sure why I _have_ to justify my decisions to you.

You don't have to - unless you want to get bcache merged into the
mainline kernel.  If you don't want to you probably shouldn't bother
appearing at the LSF, though.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:33       ` chetan loke
  2012-03-14 18:41         ` Kent Overstreet
@ 2012-03-14 18:54         ` Ted Ts'o
  2012-03-14 19:22           ` chetan loke
  2012-03-15 17:02           ` Kent Overstreet
  1 sibling, 2 replies; 21+ messages in thread
From: Ted Ts'o @ 2012-03-14 18:54 UTC (permalink / raw)
  To: chetan loke
  Cc: Kent Overstreet, Vivek Goyal, lsf-pc, nauman, linux-scsi,
	dm-devel

On Wed, Mar 14, 2012 at 02:33:25PM -0400, chetan loke wrote:
> But you are not explaining why dm is not the right stack. Just because
> it crashed when you tried doesn't mean it's not the right place.
> flash-cache works, doesn't it? flash-cache's limitation is because
> it's a dm-target or because it is using hashing or something else?
> There are start-ups who are doing quite great with SSD-cache+dm. So
> please stop kidding yourself.

SATA-attached flash is not the only kind of flash out there you know.
There is also PCIe-attached flash which is a wee bit faster (where wee
is defined as multiple orders of magnitude --- SATA-attached SSD's
typically have thousands of IOPS; Fusion I/O is shipping product today
with hundreds of thousands of IOPS, and has demonstrated a billion
IOPS early this year).  And Fusion I/O isn't the only company shipping
PCIe-attached flash products.

Startups may be doing great on SSD's; you may want to accept the fact
that there is stuff which is way, way, way better out there than
SSD's which are available on the market *today*.

And it's not like bache which is a new project.  It's working code,
just like flash cache is today.  So it's not like it needs to justify
its existence.

Best regards,

					- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:41         ` Kent Overstreet
  2012-03-14 18:47           ` Christoph Hellwig
@ 2012-03-14 19:04           ` chetan loke
  2012-03-15 17:01             ` Kent Overstreet
  1 sibling, 1 reply; 21+ messages in thread
From: chetan loke @ 2012-03-14 19:04 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 2:41 PM, Kent Overstreet <koverstreet@google.com> wrote:
> If you want me to implement bcache differently, shouldn't you explain why

relax. I explained it already but you are defensive about your code.
flash-cache works, period. And I may be wrong but it is GPL'd. If perf
is an issue then atleast let everyone know how it can be improved
rather than saying my way or the highway. aren't you saying in your
patches - support for thin prov etc. but if dm provides it then why
are you duplicating code?

> why? I'm not sure why I _have_ to justify my decisions to you.

Others might want to contribute to it and not just consume it. This
ain't your local sandbox. So it's quite common to get such questions
when you are trying to add new functionality. May be I missed some of
your emails. If so point me to them.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:54         ` Ted Ts'o
@ 2012-03-14 19:22           ` chetan loke
  2012-03-15 17:02           ` Kent Overstreet
  1 sibling, 0 replies; 21+ messages in thread
From: chetan loke @ 2012-03-14 19:22 UTC (permalink / raw)
  To: Ted Ts'o, chetan loke, Kent Overstreet, Vivek Goyal, lsf-pc,
	nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 2:54 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Wed, Mar 14, 2012 at 02:33:25PM -0400, chetan loke wrote:
>> But you are not explaining why dm is not the right stack. Just because
>> it crashed when you tried doesn't mean it's not the right place.
>> flash-cache works, doesn't it? flash-cache's limitation is because
>> it's a dm-target or because it is using hashing or something else?
>> There are start-ups who are doing quite great with SSD-cache+dm. So
>> please stop kidding yourself.
>
> SATA-attached flash is not the only kind of flash out there you know.
> There is also PCIe-attached flash which is a wee bit faster (where wee
> is defined as multiple orders of magnitude --- SATA-attached SSD's
> typically have thousands of IOPS; Fusion I/O is shipping product today
> with hundreds of thousands of IOPS, and has demonstrated a billion
> IOPS early this year).  And Fusion I/O isn't the only company shipping
> PCIe-attached flash products.
>

We've designed linux targets with million IOPS even before PCIe-flash
came into picture you know. So, I think we do know a thing or two
about million IOPS and performance. when I said 'cache' I used it
loosely. The backing store can be anything - a SSD or PCI-e or
adjacent blade over IB.


> Startups may be doing great on SSD's; you may want to accept the fact
> that there is stuff which is way, way, way better out there than
> SSD's which are available on the market *today*.
>
> And it's not like bache which is a new project.  It's working code,
> just like flash cache is today.  So it's not like it needs to justify
> its existence.
>

we are talking about approaches and not existence.

> Best regards,
>
>                                        - Ted

BR,
Chetan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bcache
  2012-03-14 17:24   ` Kent Overstreet
@ 2012-03-14 22:01     ` Mike Snitzer
  2012-03-14 22:09       ` [Lsf-pc] Bcache Williams, Dan J
  2012-03-15 17:27       ` Bcache Kent Overstreet
  2012-03-15 19:43     ` [Lsf-pc] [Topic] Bcache Vivek Goyal
  1 sibling, 2 replies; 21+ messages in thread
From: Mike Snitzer @ 2012-03-14 22:01 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel,
	Christoph Hellwig

On Wed, Mar 14 2012 at  1:24pm -0400,
Kent Overstreet <koverstreet@google.com> wrote:

> On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote:
> >> I'm already registered to attend, but would it be too late in the
> >> process to give a talk? I'd like to give a short talk about bcache, what
> >> it does and where it's going (more than just caching).
> >
> > [CCing dm-devel list]
> >
> > I am curious if you considered writing a device mapper driver for this? If
> > yes, why that is not a good choice. It seems to be stacked device and device
> > mapper should be good at that. All the configuration through sysfs seems
> > little odd to me.
> 
> Everyone asks this. Yeah, I considered it, I tried to make it work for
> a couple weeks but it was far more trouble than it was worth. I'm not
> opposed to someone else working on it but I'm not going to spend any
> more time on it myself.

I really wish you'd have worked with dm-devel more persistently, you did
post twice to dm-devel (at an awkward time of year but whatever):
http://www.redhat.com/archives/dm-devel/2010-December/msg00204.html
http://www.redhat.com/archives/dm-devel/2010-December/msg00232.html

But somewhere along the way you privately gave up on DM... and have
since repeatedly talked critically of DM.  Yet you have _never_
substantiated _why_ DM is "far more trouble than it was worth", etc.

Reading between the lines on a previous LKML bcache threads where the
questions of "why not use DM or MD?" came up:
https://lkml.org/lkml/2011/9/11/117
https://lkml.org/lkml/2011/9/15/376

It seemed your primary focus was on getting into the details of the SSD
caching ASAP -- because that is what interested you.  Both DM and MD
have a learning curve, maybe it was too frustrating and/or
distracting to tackle.

Anyway, I don't fault you for initially doing your own thing for a
virtual device framework -- it allowed you to get to the stuff you
really cared about sooner.

That said, it is frustrating that you are content to continue doing your
own thing because I'm now tasked with implementing a DM target for
caching/HSM, as I touched on here:
http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html

I have little upfront incentive to make use of bcache because it doesn't
use DM.  Not to mention DM already has its own b-tree implementation
(granted bcache is much more than it's b+tree).  I obviously won't
ignore bcache (or flashcache) but I'm setting out to build on DM
infrastructure as effectively as possible.

My initial take on how to factor things is to split into 2 DM targets:
"hsm-cache" and "hsm".  These targets reuse the infrastructure that was
recently introduced for dm-thinp: drivers/md/persistent-data/ and
dm-bufio.

Like the "thin-pool" target, the "hsm-cache" target provides a central
resource (cache) that "hsm" target device(s) will attach to.  The
"hsm-cache" target, like thin-pool, will have a data and metadata
device, constructor:
hsm-cache <metadata dev> <data dev> <data block size (sectors)> 

The "hsm" target will pair an hsm-cache device with a backing device,
constructor:
hsm <dev_id> <cache_dev> <backing_dev>

The same hsm-cache device may be used by multiple hsm devices.  So I
mean this is the same high-level architecture as bcache (shared SSD
cache).

Where things get interesting is the mechanics of the caching and the
metadata.  I'm coming to terms with the metadata now (based on desired
features and cache replacement policies), once it is nailed down I
expect things to fall into place pretty quickly.

I'm very early in the design but hope to have an initial functional
version of the code together in time for LSF -- ~2 weeks may be too
ambitious but it's my goal (could be more doable if I confine the
initial code to writethrough with LRU).

Mike

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] Bcache
  2012-03-14 22:01     ` Bcache Mike Snitzer
@ 2012-03-14 22:09       ` Williams, Dan J
  2012-03-15 17:27       ` Bcache Kent Overstreet
  1 sibling, 0 replies; 21+ messages in thread
From: Williams, Dan J @ 2012-03-14 22:09 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Kent Overstreet, linux-scsi, Christoph Hellwig, dm-devel, nauman,
	lsf-pc, Vivek Goyal

On Wed, Mar 14, 2012 at 3:01 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> I'm very early in the design but hope to have an initial functional
> version of the code together in time for LSF -- ~2 weeks may be too
> ambitious but it's my goal (could be more doable if I confine the
> initial code to writethrough with LRU).

I'm hoping caching ends up being as successful as the raid456
unification where we can have a dm or md interface in front of some
common infrastructure.  The inertia for md is to keep it close to all
the recent software raid advancements, the inertia for dm is also
clear, the inertia for something brand new... not very clear.

--
Dan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 19:04           ` chetan loke
@ 2012-03-15 17:01             ` Kent Overstreet
  0 siblings, 0 replies; 21+ messages in thread
From: Kent Overstreet @ 2012-03-15 17:01 UTC (permalink / raw)
  To: chetan loke; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 03:04:52PM -0400, chetan loke wrote:
> On Wed, Mar 14, 2012 at 2:41 PM, Kent Overstreet <koverstreet@google.com> wrote:
> > If you want me to implement bcache differently, shouldn't you explain why
> 
> relax. I explained it already but you are defensive about your code.
> flash-cache works, period. And I may be wrong but it is GPL'd. If perf
> is an issue then atleast let everyone know how it can be improved
> rather than saying my way or the highway. aren't you saying in your
> patches - support for thin prov etc. but if dm provides it then why
> are you duplicating code?

I'm not defensive about my code; you asked why someone would be
interested in bcache vs. flash cache, and performance is the most
obvious reason. Seems kind of ridiculous to then accuse me of being
defensive.

If you want to know how the performance of flash cache can be improved,
bcache's design is documented and the code is available. I'm not
interested in flash cache and improving it isn't my job; furthermore
bcache's performance comes from fundamental design decisions so I don't
think flash cache is ever going to approach bcache's performance.

> > why? I'm not sure why I _have_ to justify my decisions to you.
> 
> Others might want to contribute to it and not just consume it. This
> ain't your local sandbox. So it's quite common to get such questions
> when you are trying to add new functionality. May be I missed some of
> your emails. If so point me to them.

Helping others get involved is rather different - I'm perfectly fine to
help anyone who's interested, and I spent quite a lot of time
documenting and explaining the code, and helping users out.

But I'm just not interested in justifying bcache's existince vs.
flashcache. If you like flashcache better, it's no skin off my back.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 18:54         ` Ted Ts'o
  2012-03-14 19:22           ` chetan loke
@ 2012-03-15 17:02           ` Kent Overstreet
  1 sibling, 0 replies; 21+ messages in thread
From: Kent Overstreet @ 2012-03-15 17:02 UTC (permalink / raw)
  To: Ted Ts'o, chetan loke, Vivek Goyal, lsf-pc, nauman,
	linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 02:54:56PM -0400, Ted Ts'o wrote:
> On Wed, Mar 14, 2012 at 02:33:25PM -0400, chetan loke wrote:
> > But you are not explaining why dm is not the right stack. Just because
> > it crashed when you tried doesn't mean it's not the right place.
> > flash-cache works, doesn't it? flash-cache's limitation is because
> > it's a dm-target or because it is using hashing or something else?
> > There are start-ups who are doing quite great with SSD-cache+dm. So
> > please stop kidding yourself.
> 
> SATA-attached flash is not the only kind of flash out there you know.
> There is also PCIe-attached flash which is a wee bit faster (where wee
> is defined as multiple orders of magnitude --- SATA-attached SSD's
> typically have thousands of IOPS; Fusion I/O is shipping product today
> with hundreds of thousands of IOPS, and has demonstrated a billion
> IOPS early this year).  And Fusion I/O isn't the only company shipping
> PCIe-attached flash products.
> 
> Startups may be doing great on SSD's; you may want to accept the fact
> that there is stuff which is way, way, way better out there than
> SSD's which are available on the market *today*.
> 
> And it's not like bache which is a new project.  It's working code,
> just like flash cache is today.  So it's not like it needs to justify
> its existence.
> 
> Best regards,
> 
> 					- Ted

Thanks Ted, as usual you word things rather less abrasively than me :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bcache
  2012-03-14 22:01     ` Bcache Mike Snitzer
  2012-03-14 22:09       ` [Lsf-pc] Bcache Williams, Dan J
@ 2012-03-15 17:27       ` Kent Overstreet
  2012-03-15 20:17         ` Bcache Mike Snitzer
  1 sibling, 1 reply; 21+ messages in thread
From: Kent Overstreet @ 2012-03-15 17:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel,
	Christoph Hellwig

On Wed, Mar 14, 2012 at 06:01:50PM -0400, Mike Snitzer wrote:
> On Wed, Mar 14 2012 at  1:24pm -0400,
> Kent Overstreet <koverstreet@google.com> wrote:
> 
> > On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote:
> > >> I'm already registered to attend, but would it be too late in the
> > >> process to give a talk? I'd like to give a short talk about bcache, what
> > >> it does and where it's going (more than just caching).
> > >
> > > [CCing dm-devel list]
> > >
> > > I am curious if you considered writing a device mapper driver for this? If
> > > yes, why that is not a good choice. It seems to be stacked device and device
> > > mapper should be good at that. All the configuration through sysfs seems
> > > little odd to me.
> > 
> > Everyone asks this. Yeah, I considered it, I tried to make it work for
> > a couple weeks but it was far more trouble than it was worth. I'm not
> > opposed to someone else working on it but I'm not going to spend any
> > more time on it myself.
> 
> I really wish you'd have worked with dm-devel more persistently, you did
> post twice to dm-devel (at an awkward time of year but whatever):
> http://www.redhat.com/archives/dm-devel/2010-December/msg00204.html
> http://www.redhat.com/archives/dm-devel/2010-December/msg00232.html

I spent quite a bit of time talking to Heinz Mauelshagen and someone
else who's name escapes me; I also spent around two weeks working on
bcache-dm code before I decided it was unworkable.

And bcache is two years old now, if the dm guys wanted bcache to use dm
there's been ample opportunity; nobody's been interested enough to do
anything about it. I'm still not against a bcache-dm interface, if
someone else can make it work - I just really have no interest or reason
to write the code myself. It works fine as it is.

> But somewhere along the way you privately gave up on DM... and have
> since repeatedly talked critically of DM.  Yet you have _never_
> substantiated _why_ DM is "far more trouble than it was worth", etc.

I have, can't blame you for missing it but honestly this comes up
constantly; people asking me (often accusitavely) why bcache doesn't use
dm and it gets really old. I've got better things to do.

Frankly, my biggest complaint with the DM is that the code is _terrible_
and very poorly documented. It's an inflexible framework that tries to
combine a bunch of things that should be orthogonal. My other complaints
all stem from that; it became very clear that it wasn't designed for
creating a block device from the kernel, which is kind of necessary (at
least the only sane way of doing it, IMO) when metadata is managed by
the kernel (and the kernel has to manage most metadata for bcache).

> Reading between the lines on a previous LKML bcache threads where the
> questions of "why not use DM or MD?" came up:
> https://lkml.org/lkml/2011/9/11/117
> https://lkml.org/lkml/2011/9/15/376
> 
> It seemed your primary focus was on getting into the details of the SSD
> caching ASAP -- because that is what interested you.  Both DM and MD
> have a learning curve, maybe it was too frustrating and/or
> distracting to tackle.
> 
> Anyway, I don't fault you for initially doing your own thing for a
> virtual device framework -- it allowed you to get to the stuff you
> really cared about sooner.
> 
> That said, it is frustrating that you are content to continue doing your
> own thing because I'm now tasked with implementing a DM target for
> caching/HSM, as I touched on here:
> http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html

Kind of presumptuous, don't you think?

I've nothing at all against collaborating, or you or other dm devs
adapting bcache code - I'd help out with that!

But I'm just not going to write my code a certain way just to suit you.

> I have little upfront incentive to make use of bcache because it doesn't
> use DM.  Not to mention DM already has its own b-tree implementation
> (granted bcache is much more than it's b+tree).  I obviously won't
> ignore bcache (or flashcache) but I'm setting out to build on DM
> infrastructure as effectively as possible.

Oh, darn.

> My initial take on how to factor things is to split into 2 DM targets:
> "hsm-cache" and "hsm".  These targets reuse the infrastructure that was
> recently introduced for dm-thinp: drivers/md/persistent-data/ and
> dm-bufio.
> 
> Like the "thin-pool" target, the "hsm-cache" target provides a central
> resource (cache) that "hsm" target device(s) will attach to.  The
> "hsm-cache" target, like thin-pool, will have a data and metadata
> device, constructor:
> hsm-cache <metadata dev> <data dev> <data block size (sectors)> 
> 
> The "hsm" target will pair an hsm-cache device with a backing device,
> constructor:
> hsm <dev_id> <cache_dev> <backing_dev>
> 
> The same hsm-cache device may be used by multiple hsm devices.  So I
> mean this is the same high-level architecture as bcache (shared SSD
> cache).
> 
> Where things get interesting is the mechanics of the caching and the
> metadata.  I'm coming to terms with the metadata now (based on desired
> features and cache replacement policies), once it is nailed down I
> expect things to fall into place pretty quickly.
> 
> I'm very early in the design but hope to have an initial functional
> version of the code together in time for LSF -- ~2 weeks may be too
> ambitious but it's my goal (could be more doable if I confine the
> initial code to writethrough with LRU).

Look forward to seeing the benchmarks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-14 17:24   ` Kent Overstreet
  2012-03-14 22:01     ` Bcache Mike Snitzer
@ 2012-03-15 19:43     ` Vivek Goyal
  2012-03-15 23:46       ` Kent Overstreet
  1 sibling, 1 reply; 21+ messages in thread
From: Vivek Goyal @ 2012-03-15 19:43 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: lsf-pc, nauman, linux-scsi, dm-devel

On Wed, Mar 14, 2012 at 01:24:08PM -0400, Kent Overstreet wrote:

[..]
> 
> Can you post the full log? There was a bug where if it encountered an
> error during registration, it wouldn't wait for a uuid read or write
> before tearing everything down - that's what your backtrace looks like
> to me.
> 
> You could try the bcache-3.2-dev branch, too. I have a newer branch
> with a ton of bugfixes but I'm waiting until it's seen more testing
> before I post it.

Faced the same issue on bcache-3.2-dev branch too.

login: [  167.532932] bio: create slab <bio-1> at 1
[  167.539071] bcache: invalidating existing data
[  167.547604] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[  167.548573] CPU 2 
[  167.548573] Modules linked in: floppy [last unloaded: scsi_wait_scan]
[  167.548573] 
[  167.548573] Pid: 0, comm: swapper/2 Not tainted 3.2.0-bcache+ #4
Hewlett-Packard HP xw6600 Workstation/0A9Ch
[  167.548573] RIP: 0010:[<ffffffff8144d6fe>]  [<ffffffff8144d6fe>]
closure_put+0xe/0x20
[  167.548573] RSP: 0018:ffff88013fc83c60  EFLAGS: 00010246
[  167.548573] RAX: 0000000000000000 RBX: ffff8801385b04a0 RCX:
0000000000000000
[  167.548573] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
6b6b6b6b6b6b6b6b
[  167.548573] RBP: ffff88013fc83c60 R08: 0000000000000000 R09:
0000000000000001
[  167.548573] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[  167.548573] R13: ffff880137719580 R14: 0000000000080000 R15:
0000000000000000
[  167.548573] FS:  0000000000000000(0000) GS:ffff88013fc80000(0000)
knlGS:0000000000000000
[  167.548573] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  167.548573] CR2: 00007f6e84f70240 CR3: 000000013707d000 CR4:
00000000000006e0
[  167.548573] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  167.548573] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  167.548573] Process swapper/2 (pid: 0, threadinfo ffff88013a454000,
task ffff88013a458000)
[  167.548573] Stack:
[  167.548573]  ffff88013fc83c80 ffffffff814448c6 ffffffff00000000
ffff8801385b04a0
[  167.548573]  ffff88013fc83c90 ffffffff8117ae8d ffff88013fc83cc0
ffffffff812e2273
[  167.548573]  ffff88013a454000 0000000000000000 ffff8801385b04a0
0000000000080000
[  167.548573] Call Trace:
[  167.548573]  <IRQ> 
[  167.548573]  [<ffffffff814448c6>] uuid_endio+0x36/0x40
[  167.548573]  [<ffffffff8117ae8d>] bio_endio+0x1d/0x40
[  167.548573]  [<ffffffff812e2273>] req_bio_endio+0x83/0xc0
[  167.548573]  [<ffffffff812e53e1>] blk_update_request+0x101/0x5c0
[  167.548573]  [<ffffffff812e5612>] ? blk_update_request+0x332/0x5c0
[  167.548573]  [<ffffffff812e58d1>] blk_update_bidi_request+0x31/0x90
[  167.548573]  [<ffffffff812e595c>] blk_end_bidi_request+0x2c/0x80
[  167.548573]  [<ffffffff812e59f0>] blk_end_request+0x10/0x20
[  167.548573]  [<ffffffff81458fdc>] scsi_io_completion+0x9c/0x5f0
[  167.548573]  [<ffffffff8144fcd0>] scsi_finish_command+0xb0/0xe0
[  167.548573]  [<ffffffff81458dc5>] scsi_softirq_done+0xa5/0x140
[  167.548573]  [<ffffffff812ec55b>] blk_done_softirq+0x7b/0x90
[  167.548573]  [<ffffffff810512ae>] __do_softirq+0xce/0x3c0
[  167.548573]  [<ffffffff817e84ac>] call_softirq+0x1c/0x30
[  167.548573]  [<ffffffff8100417d>] do_softirq+0x8d/0xc0
[  167.548573]  [<ffffffff810518de>] irq_exit+0xae/0xe0
[  167.548573]  [<ffffffff817e8bb3>] do_IRQ+0x63/0xe0
[  167.548573]  [<ffffffff817de1f0>] common_interrupt+0x70/0x70
[  167.548573]  <EOI> 
[  167.548573]  [<ffffffff8100a5f6>] ? mwait_idle+0xb6/0x490
[  167.548573]  [<ffffffff8100a5ed>] ? mwait_idle+0xad/0x490
[  167.548573]  [<ffffffff810011e6>] cpu_idle+0x96/0xe0
[  167.548573]  [<ffffffff817cb475>] start_secondary+0x1be/0x1c2
[  167.548573] Code: ee 01 00 00 10 e8 03 ff ff ff 48 85 db 75 de 5b 41 5c
5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 be ff ff ff ff
<f0> 0f c1 77 48 83 ee 01 e8 d5 fe ff ff 5d c3 0f 1f 00 55 48 89 
[  167.548573] RIP  [<ffffffff8144d6fe>] closure_put+0xe/0x20
[  167.548573]  RSP <ffff88013fc83c60>

Thanks
Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bcache
  2012-03-15 17:27       ` Bcache Kent Overstreet
@ 2012-03-15 20:17         ` Mike Snitzer
  2012-03-15 22:59           ` Bcache Kent Overstreet
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2012-03-15 20:17 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel,
	Christoph Hellwig

On Thu, Mar 15 2012 at  1:27pm -0400,
Kent Overstreet <koverstreet@google.com> wrote:

> On Wed, Mar 14, 2012 at 06:01:50PM -0400, Mike Snitzer wrote:
> > I really wish you'd have worked with dm-devel more persistently, you did
> > post twice to dm-devel (at an awkward time of year but whatever):
> > http://www.redhat.com/archives/dm-devel/2010-December/msg00204.html
> > http://www.redhat.com/archives/dm-devel/2010-December/msg00232.html
> 
> I spent quite a bit of time talking to Heinz Mauelshagen and someone
> else who's name escapes me; I also spent around two weeks working on
> bcache-dm code before I decided it was unworkable.
> 
> And bcache is two years old now, if the dm guys wanted bcache to use dm
> there's been ample opportunity; nobody's been interested enough to do
> anything about it. I'm still not against a bcache-dm interface, if
> someone else can make it work - I just really have no interest or reason
> to write the code myself. It works fine as it is.

Your interest should be in getting the hard work you've put into bcache
upstream.  That's unlikely to happen until you soften on your reluctance
to embrace existing appropriate kernel interfaces.

> Frankly, my biggest complaint with the DM is that the code is _terrible_
> and very poorly documented. It's an inflexible framework that tries to
> combine a bunch of things that should be orthogonal. My other complaints
> all stem from that; it became very clear that it wasn't designed for
> creating a block device from the kernel, which is kind of necessary (at
> least the only sane way of doing it, IMO) when metadata is managed by
> the kernel (and the kernel has to manage most metadata for bcache).

Baseless and unspecific assertions don't help your cause -- dm-thinp
disproves your unconvincing position (manages it's metadata in kernel,
etc).

Seems pretty clear you could care less about _really_ working together
-- maybe it is just this DM/kernel interface thing gets you down.

Regardless, the burden is on me (and all developers who have a desire to
see a caching/HSM driver get upstream) to evaluate bcache.  That process
has started -- hopefully it'll be as simple as:

1) put a DM target wrapper in place of your sysfs interface.
2) switch/port bcache's btree over to drivers/md/persistent-data/
3) dm-bcache FTW

One could dream.

The little bit I've looked at bcache it already seems unrealistic; for
starters you have the btree wired directly to bio submission.
drivers/md/persistent-data/ offers a layered approach,
dm-block-manager.c brokers the IO submission (via dm-bufio) so the
management of the btree(s) doesn't need to be concerned with actual IO.

bcache is _very_ tightly coupled with your btree implementation.

> > Reading between the lines on a previous LKML bcache threads where the
> > questions of "why not use DM or MD?" came up:
> > https://lkml.org/lkml/2011/9/11/117
> > https://lkml.org/lkml/2011/9/15/376
> > 
> > It seemed your primary focus was on getting into the details of the SSD
> > caching ASAP -- because that is what interested you.  Both DM and MD
> > have a learning curve, maybe it was too frustrating and/or
> > distracting to tackle.
> > 
> > Anyway, I don't fault you for initially doing your own thing for a
> > virtual device framework -- it allowed you to get to the stuff you
> > really cared about sooner.
> > 
> > That said, it is frustrating that you are content to continue doing your
> > own thing because I'm now tasked with implementing a DM target for
> > caching/HSM, as I touched on here:
> > http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html
> 
> Kind of presumptuous, don't you think?

Not really, considering what I'm responding to at the moment ;)

> I've nothing at all against collaborating, or you or other dm devs
> adapting bcache code - I'd help out with that!

OK.

> But I'm just not going to write my code a certain way just to suit you.

upstream kumbaya: more cooperative eyes on the problem, working to hook
into established interfaces, will produce a solution that is worthy of
upstream inclusion.

> Look forward to seeing the benchmarks.

Speaking of which, weren't you saying you'd show bcache benchmarks in a
previous LKML thread?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bcache
  2012-03-15 20:17         ` Bcache Mike Snitzer
@ 2012-03-15 22:59           ` Kent Overstreet
  2012-03-16  1:45             ` Bcache Mike Snitzer
  0 siblings, 1 reply; 21+ messages in thread
From: Kent Overstreet @ 2012-03-15 22:59 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel,
	Christoph Hellwig

On Thu, Mar 15, 2012 at 04:17:32PM -0400, Mike Snitzer wrote:
> Your interest should be in getting the hard work you've put into bcache
> upstream.  That's unlikely to happen until you soften on your reluctance
> to embrace existing appropriate kernel interfaces.

I don't really care what you think my priorities should be. I write code
first and foremost for myself, and the one thing I care about is good
code.

I'd love to have bcache in mainline, seeing more use and getting more
improvements - but if that's contingent on making it work through dm,
sorry, not interested.

If you want to convince me that dm is the right way to go you'll have
much better luck with technical arguments.

Besides which, I'm planning on (and very soon going to be working on)
growing bcache down into an FTL and up into the bottom half of a
filesystem. As far as I can tell integrating with dm would only get in
the way of that.

It's actually not as crazy as it sounds - the basic idea is to make the
index the central abstraction, and allocation policies sit conceptually
underneath and are abstracted out - and sitting top, some filesystem
code (and possibly other things) uses the existing code as if it were
some kind of object storage like thing; the existing bcache code maps
inode number:offset -> lba instead of cached device:offset.

I'll explain more at LSF, but eventually it ought to look vaguely like
btrfs/zfs but with better abstraction and better performance.

> > Frankly, my biggest complaint with the DM is that the code is _terrible_
> > and very poorly documented. It's an inflexible framework that tries to
> > combine a bunch of things that should be orthogonal. My other complaints
> > all stem from that; it became very clear that it wasn't designed for
> > creating a block device from the kernel, which is kind of necessary (at
> > least the only sane way of doing it, IMO) when metadata is managed by
> > the kernel (and the kernel has to manage most metadata for bcache).
> 
> Baseless and unspecific assertions don't help your cause -- dm-thinp
> disproves your unconvincing position (manages it's metadata in kernel,
> etc).

I'm not the only one who's read the dm code and found it lacking - and
anyways, I'm not really out to convince anyone. 

> Seems pretty clear you could care less about _really_ working together
> -- maybe it is just this DM/kernel interface thing gets you down.

Dude, I reached out to dm developers ages ago. Maybe if you guys had
shown some interest we wouldn't be having this conversation now.

This finger pointing is ridiculous and getting us nowhere.

> Regardless, the burden is on me (and all developers who have a desire to
> see a caching/HSM driver get upstream) to evaluate bcache.  That process
> has started -- hopefully it'll be as simple as:
> 
> 1) put a DM target wrapper in place of your sysfs interface.
> 2) switch/port bcache's btree over to drivers/md/persistent-data/
> 3) dm-bcache FTW

Replacing bcache's persistent metadata code? Hah. That's the central
part of the design!

Is this the way new filesystems are evaluated? No, it's not. What makes
you more special than ext4?

> One could dream.
> 
> The little bit I've looked at bcache it already seems unrealistic; for
> starters you have the btree wired directly to bio submission.
> drivers/md/persistent-data/ offers a layered approach,
> dm-block-manager.c brokers the IO submission (via dm-bufio) so the
> management of the btree(s) doesn't need to be concerned with actual IO.
> 
> bcache is _very_ tightly coupled with your btree implementation.

Yes, it is! It really has to be, efficiently allocating buckets and
invalidating cached data relies on specific details of the btree
implementation.

The btree is _central_ to bcache, ignoring that the rest of the code
isn't all that interesting.

> > > That said, it is frustrating that you are content to continue doing your
> > > own thing because I'm now tasked with implementing a DM target for
> > > caching/HSM, as I touched on here:
> > > http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html
> > 
> > Kind of presumptuous, don't you think?
> 
> Not really, considering what I'm responding to at the moment ;)

Maybe you should consider how you word things...

> > I've nothing at all against collaborating, or you or other dm devs
> > adapting bcache code - I'd help out with that!
> 
> OK.
> 
> > But I'm just not going to write my code a certain way just to suit you.
> 
> upstream kumbaya: more cooperative eyes on the problem, working to hook
> into established interfaces, will produce a solution that is worthy of
> upstream inclusion.

Let me be clear: All I care about is the best solution. I'm more than
happy to work with other people to achieve that, but I don't give a damn
about anything else.

> > Look forward to seeing the benchmarks.
> 
> Speaking of which, weren't you saying you'd show bcache benchmarks in a
> previous LKML thread?

Yeah I did, but as usual I got distracted. I'm travelling for the next
three weeks, but maybe I can get someone else to get some numbers that
we can publish...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Lsf-pc] [Topic] Bcache
  2012-03-15 19:43     ` [Lsf-pc] [Topic] Bcache Vivek Goyal
@ 2012-03-15 23:46       ` Kent Overstreet
  0 siblings, 0 replies; 21+ messages in thread
From: Kent Overstreet @ 2012-03-15 23:46 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: lsf-pc, nauman, linux-scsi, dm-devel

On Thu, Mar 15, 2012 at 03:43:36PM -0400, Vivek Goyal wrote:
> On Wed, Mar 14, 2012 at 01:24:08PM -0400, Kent Overstreet wrote:
> 
> [..]
> > 
> > Can you post the full log? There was a bug where if it encountered an
> > error during registration, it wouldn't wait for a uuid read or write
> > before tearing everything down - that's what your backtrace looks like
> > to me.
> > 
> > You could try the bcache-3.2-dev branch, too. I have a newer branch
> > with a ton of bugfixes but I'm waiting until it's seen more testing
> > before I post it.
> 
> Faced the same issue on bcache-3.2-dev branch too.

Shoot.

Well, I know I fixed that bug (well, a bug with the same symptoms), and
I guess that branch was kind of old.

I just updated the bcache-3.2-dev branch to the newest vaguely possibly
tested code; the 3.2 version is only build tested (we're still
developing on 2.6.34).

> 
> login: [  167.532932] bio: create slab <bio-1> at 1
> [  167.539071] bcache: invalidating existing data
> [  167.547604] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> [  167.548573] CPU 2 
> [  167.548573] Modules linked in: floppy [last unloaded: scsi_wait_scan]
> [  167.548573] 
> [  167.548573] Pid: 0, comm: swapper/2 Not tainted 3.2.0-bcache+ #4
> Hewlett-Packard HP xw6600 Workstation/0A9Ch
> [  167.548573] RIP: 0010:[<ffffffff8144d6fe>]  [<ffffffff8144d6fe>]
> closure_put+0xe/0x20
> [  167.548573] RSP: 0018:ffff88013fc83c60  EFLAGS: 00010246
> [  167.548573] RAX: 0000000000000000 RBX: ffff8801385b04a0 RCX:
> 0000000000000000
> [  167.548573] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> 6b6b6b6b6b6b6b6b
> [  167.548573] RBP: ffff88013fc83c60 R08: 0000000000000000 R09:
> 0000000000000001
> [  167.548573] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [  167.548573] R13: ffff880137719580 R14: 0000000000080000 R15:
> 0000000000000000
> [  167.548573] FS:  0000000000000000(0000) GS:ffff88013fc80000(0000)
> knlGS:0000000000000000
> [  167.548573] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  167.548573] CR2: 00007f6e84f70240 CR3: 000000013707d000 CR4:
> 00000000000006e0
> [  167.548573] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  167.548573] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  167.548573] Process swapper/2 (pid: 0, threadinfo ffff88013a454000,
> task ffff88013a458000)
> [  167.548573] Stack:
> [  167.548573]  ffff88013fc83c80 ffffffff814448c6 ffffffff00000000
> ffff8801385b04a0
> [  167.548573]  ffff88013fc83c90 ffffffff8117ae8d ffff88013fc83cc0
> ffffffff812e2273
> [  167.548573]  ffff88013a454000 0000000000000000 ffff8801385b04a0
> 0000000000080000
> [  167.548573] Call Trace:
> [  167.548573]  <IRQ> 
> [  167.548573]  [<ffffffff814448c6>] uuid_endio+0x36/0x40
> [  167.548573]  [<ffffffff8117ae8d>] bio_endio+0x1d/0x40
> [  167.548573]  [<ffffffff812e2273>] req_bio_endio+0x83/0xc0
> [  167.548573]  [<ffffffff812e53e1>] blk_update_request+0x101/0x5c0
> [  167.548573]  [<ffffffff812e5612>] ? blk_update_request+0x332/0x5c0
> [  167.548573]  [<ffffffff812e58d1>] blk_update_bidi_request+0x31/0x90
> [  167.548573]  [<ffffffff812e595c>] blk_end_bidi_request+0x2c/0x80
> [  167.548573]  [<ffffffff812e59f0>] blk_end_request+0x10/0x20
> [  167.548573]  [<ffffffff81458fdc>] scsi_io_completion+0x9c/0x5f0
> [  167.548573]  [<ffffffff8144fcd0>] scsi_finish_command+0xb0/0xe0
> [  167.548573]  [<ffffffff81458dc5>] scsi_softirq_done+0xa5/0x140
> [  167.548573]  [<ffffffff812ec55b>] blk_done_softirq+0x7b/0x90
> [  167.548573]  [<ffffffff810512ae>] __do_softirq+0xce/0x3c0
> [  167.548573]  [<ffffffff817e84ac>] call_softirq+0x1c/0x30
> [  167.548573]  [<ffffffff8100417d>] do_softirq+0x8d/0xc0
> [  167.548573]  [<ffffffff810518de>] irq_exit+0xae/0xe0
> [  167.548573]  [<ffffffff817e8bb3>] do_IRQ+0x63/0xe0
> [  167.548573]  [<ffffffff817de1f0>] common_interrupt+0x70/0x70
> [  167.548573]  <EOI> 
> [  167.548573]  [<ffffffff8100a5f6>] ? mwait_idle+0xb6/0x490
> [  167.548573]  [<ffffffff8100a5ed>] ? mwait_idle+0xad/0x490
> [  167.548573]  [<ffffffff810011e6>] cpu_idle+0x96/0xe0
> [  167.548573]  [<ffffffff817cb475>] start_secondary+0x1be/0x1c2
> [  167.548573] Code: ee 01 00 00 10 e8 03 ff ff ff 48 85 db 75 de 5b 41 5c
> 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 be ff ff ff ff
> <f0> 0f c1 77 48 83 ee 01 e8 d5 fe ff ff 5d c3 0f 1f 00 55 48 89 
> [  167.548573] RIP  [<ffffffff8144d6fe>] closure_put+0xe/0x20
> [  167.548573]  RSP <ffff88013fc83c60>
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bcache
  2012-03-15 22:59           ` Bcache Kent Overstreet
@ 2012-03-16  1:45             ` Mike Snitzer
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Snitzer @ 2012-03-16  1:45 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel,
	Christoph Hellwig

On Thu, Mar 15 2012 at  6:59pm -0400,
Kent Overstreet <koverstreet@google.com> wrote:

> On Thu, Mar 15, 2012 at 04:17:32PM -0400, Mike Snitzer wrote:
> > Your interest should be in getting the hard work you've put into bcache
> > upstream.  That's unlikely to happen until you soften on your reluctance
> > to embrace existing appropriate kernel interfaces.
> 
> I don't really care what you think my priorities should be. I write code
> first and foremost for myself, and the one thing I care about is good
> code.
> 
> I'd love to have bcache in mainline, seeing more use and getting more
> improvements - but if that's contingent on making it work through dm,
> sorry, not interested.
> 
> If you want to convince me that dm is the right way to go you'll have
> much better luck with technical arguments.

We have quite a lot of code that illustrates how to implement DM
targets.  DM isn't forcing undue or cumbersome constraints that prevent
it's use for complex targets with in-kernel metadata -- again dm-thinp
proves this.

It is your burden to even begin to substantiate _why_ both DM and MD are
inadequate frameworks for virtual block device drivers.

> > Baseless and unspecific assertions don't help your cause -- dm-thinp
> > disproves your unconvincing position (manages it's metadata in kernel,
> > etc).
> 
> I'm not the only one who's read the dm code and found it lacking - and
> anyways, I'm not really out to convince anyone. 

Like other kernel code, DM is approachable for those who are willing to
put the time in to understand it.  Your hand-waving (and now proxy)
critiques leave us nothing to work with.

> > > Kind of presumptuous, don't you think?
> > 
> > Not really, considering what I'm responding to at the moment ;)
> 
> Maybe you should consider how you word things...

Say what?  Nice projection.  Luckily the thread is public for all to see.

I initially thought Christoph's feedback in this thread was harsh; now
it seems eerily prophetic.

Lets stop wasting our time on this thread.  Maybe we can be more
constructive in the future.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-03-16  1:46 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-14 13:32 [Topic] Bcache Kent Overstreet
2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal
2012-03-14 17:24   ` Kent Overstreet
2012-03-14 22:01     ` Bcache Mike Snitzer
2012-03-14 22:09       ` [Lsf-pc] Bcache Williams, Dan J
2012-03-15 17:27       ` Bcache Kent Overstreet
2012-03-15 20:17         ` Bcache Mike Snitzer
2012-03-15 22:59           ` Bcache Kent Overstreet
2012-03-16  1:45             ` Bcache Mike Snitzer
2012-03-15 19:43     ` [Lsf-pc] [Topic] Bcache Vivek Goyal
2012-03-15 23:46       ` Kent Overstreet
2012-03-14 18:12   ` chetan loke
2012-03-14 18:17     ` Kent Overstreet
2012-03-14 18:33       ` chetan loke
2012-03-14 18:41         ` Kent Overstreet
2012-03-14 18:47           ` Christoph Hellwig
2012-03-14 19:04           ` chetan loke
2012-03-15 17:01             ` Kent Overstreet
2012-03-14 18:54         ` Ted Ts'o
2012-03-14 19:22           ` chetan loke
2012-03-15 17:02           ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox