* [Topic] Bcache @ 2012-03-14 13:32 Kent Overstreet 2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal 0 siblings, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2012-03-14 13:32 UTC (permalink / raw) To: lsf-pc; +Cc: linux-scsi, nauman I'm already registered to attend, but would it be too late in the process to give a talk? I'd like to give a short talk about bcache, what it does and where it's going (more than just caching). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 13:32 [Topic] Bcache Kent Overstreet @ 2012-03-14 15:53 ` Vivek Goyal 2012-03-14 17:24 ` Kent Overstreet 2012-03-14 18:12 ` chetan loke 0 siblings, 2 replies; 21+ messages in thread From: Vivek Goyal @ 2012-03-14 15:53 UTC (permalink / raw) To: Kent Overstreet; +Cc: lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote: > I'm already registered to attend, but would it be too late in the > process to give a talk? I'd like to give a short talk about bcache, what > it does and where it's going (more than just caching). [CCing dm-devel list] I am curious if you considered writing a device mapper driver for this? If yes, why that is not a good choice. It seems to be stacked device and device mapper should be good at that. All the configuration through sysfs seems little odd to me. On a side note, I was playing with bcache a bit. I tried to register the cache device and it crashes. (I guess I should post this on relevant mailing list). # echo /dev/sdc > /sys/fs/bcache/register [ 6758.314093] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-bcache+ #2 Hewlett-Packard HP xw6600 Workstation/0A9Ch [ 6758.314093] RIP: 0010:[<ffffffff8146625b>] [<ffffffff8146625b>] closure_put+0x5b/0xe0 [ 6758.314093] RSP: 0018:ffff88013fc83c60 EFLAGS: 00010246 [ 6758.314093] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801281204a0 RCX: 0000000000000000 [ 6758.314093] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff88013906ec48 [ 6758.314093] RBP: ffff88013fc83c60 R08: 0000000000000000 R09: 0000000000000001 [ 6758.314093] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 6758.314093] R13: ffff880130b58560 R14: 0000000000080000 R15: 0000000000000000 [ 6758.314093] FS: 0000000000000000(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000 [ 6758.314093] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 6758.314093] CR2: 00007f9becec7000 CR3: 0000000137fe0000 CR4: 00000000000006e0 [ 6758.314093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6758.314093] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 6758.314093] Process kworker/0:1 (pid: 0, threadinfo ffff88013a44a000, task ffff88013a458000) [ 6758.314093] Stack: [ 6758.314093] ffff88013fc83c80 ffffffff8145ee6d ffffffff00000000 ffff8801281204a0 [ 6758.314093] ffff88013fc83c90 ffffffff81173a9d ffff88013fc83cc0 ffffffff812d15d3 [ 6758.314093] ffff88013a44a000 0000000000000000 ffff8801281204a0 0000000000080000 [ 6758.314093] Call Trace: [ 6758.314093] <IRQ> [ 6758.314093] [<ffffffff8145ee6d>] uuid_endio+0x3d/0x50 [ 6758.314093] [<ffffffff81173a9d>] bio_endio+0x1d/0x40 [ 6758.314093] [<ffffffff812d15d3>] req_bio_endio+0x83/0xc0 [ 6758.314093] [<ffffffff812d4f71>] blk_update_request+0x101/0x5c0 [ 6758.314093] [<ffffffff812d51a2>] ? blk_update_request+0x332/0x5c0 [ 6758.314093] [<ffffffff812d5461>] blk_update_bidi_request+0x31/0x90 [ 6758.314093] [<ffffffff812d54ec>] blk_end_bidi_request+0x2c/0x80 [ 6758.314093] [<ffffffff812d5580>] blk_end_request+0x10/0x20 [ 6758.314093] [<ffffffff81471b7c>] scsi_io_completion+0x9c/0x5f0 [ 6758.314093] [<ffffffff81468940>] scsi_finish_command+0xb0/0xe0 [ 6758.314093] [<ffffffff81471965>] scsi_softirq_done+0xa5/0x140 [ 6758.314093] [<ffffffff812db70b>] blk_done_softirq+0x7b/0x90 [ 6758.314093] [<ffffffff8104fc65>] __do_softirq+0xc5/0x3a0 [ 6758.314093] [<ffffffff817f6dac>] call_softirq+0x1c/0x30 [ 6758.314093] [<ffffffff8100419d>] do_softirq+0x8d/0xc0 [ 6758.314093] [<ffffffff8105027e>] irq_exit+0xae/0xe0 [ 6758.314093] [<ffffffff817f74b3>] do_IRQ+0x63/0xe0 [ 6758.314093] [<ffffffff817ecc30>] common_interrupt+0x70/0x70 [ 6758.314093] <EOI> [ 6758.314093] [<ffffffff8100a1f6>] ? mwait_idle+0xb6/0x470 [ 6758.314093] [<ffffffff8100a1ed>] ? mwait_idle+0xad/0x470 [ 6758.314093] [<ffffffff810011df>] cpu_idle+0x8f/0xd0 [ 6758.314093] [<ffffffff817da107>] start_secondary+0x1be/0x1c2 [ 6758.314093] Code: 00 48 8b 50 48 83 e2 08 0f 85 9c 00 00 00 48 8b 50 48 83 e2 10 0f 85 8d 00 00 00 48 83 78 18 00 75 46 48 8b 40 40 48 85 c0 74 24 [ 6758.314093] 8b 50 48 48 c1 ea 04 89 d1 89 f2 83 e1 01 f0 0f c1 50 4c 83 [ 6758.314093] RIP [<ffffffff8146625b>] closure_put+0x5b/0xe0 [ 6758.314093] RSP <ffff88013fc83c60> Thanks Vivek ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal @ 2012-03-14 17:24 ` Kent Overstreet 2012-03-14 22:01 ` Bcache Mike Snitzer 2012-03-15 19:43 ` [Lsf-pc] [Topic] Bcache Vivek Goyal 2012-03-14 18:12 ` chetan loke 1 sibling, 2 replies; 21+ messages in thread From: Kent Overstreet @ 2012-03-14 17:24 UTC (permalink / raw) To: Vivek Goyal; +Cc: lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote: >> I'm already registered to attend, but would it be too late in the >> process to give a talk? I'd like to give a short talk about bcache, what >> it does and where it's going (more than just caching). > > [CCing dm-devel list] > > I am curious if you considered writing a device mapper driver for this? If > yes, why that is not a good choice. It seems to be stacked device and device > mapper should be good at that. All the configuration through sysfs seems > little odd to me. Everyone asks this. Yeah, I considered it, I tried to make it work for a couple weeks but it was far more trouble than it was worth. I'm not opposed to someone else working on it but I'm not going to spend any more time on it myself. > > On a side note, I was playing with bcache a bit. I tried to register the > cache device and it crashes. (I guess I should post this on relevant mailing > list). Can you post the full log? There was a bug where if it encountered an error during registration, it wouldn't wait for a uuid read or write before tearing everything down - that's what your backtrace looks like to me. You could try the bcache-3.2-dev branch, too. I have a newer branch with a ton of bugfixes but I'm waiting until it's seen more testing before I post it. > > # echo /dev/sdc > /sys/fs/bcache/register > > [ 6758.314093] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-bcache+ #2 > Hewlett-Packard HP xw6600 Workstation/0A9Ch > [ 6758.314093] RIP: 0010:[<ffffffff8146625b>] [<ffffffff8146625b>] > closure_put+0x5b/0xe0 > [ 6758.314093] RSP: 0018:ffff88013fc83c60 EFLAGS: 00010246 > [ 6758.314093] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801281204a0 RCX: > 0000000000000000 > [ 6758.314093] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: > ffff88013906ec48 > [ 6758.314093] RBP: ffff88013fc83c60 R08: 0000000000000000 R09: > 0000000000000001 > [ 6758.314093] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000000 > [ 6758.314093] R13: ffff880130b58560 R14: 0000000000080000 R15: > 0000000000000000 > [ 6758.314093] FS: 0000000000000000(0000) GS:ffff88013fc80000(0000) > knlGS:0000000000000000 > [ 6758.314093] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 6758.314093] CR2: 00007f9becec7000 CR3: 0000000137fe0000 CR4: > 00000000000006e0 > [ 6758.314093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 6758.314093] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 6758.314093] Process kworker/0:1 (pid: 0, threadinfo ffff88013a44a000, > task ffff88013a458000) > [ 6758.314093] Stack: > [ 6758.314093] ffff88013fc83c80 ffffffff8145ee6d ffffffff00000000 > ffff8801281204a0 > [ 6758.314093] ffff88013fc83c90 ffffffff81173a9d ffff88013fc83cc0 > ffffffff812d15d3 > [ 6758.314093] ffff88013a44a000 0000000000000000 ffff8801281204a0 > 0000000000080000 > [ 6758.314093] Call Trace: > [ 6758.314093] <IRQ> > [ 6758.314093] [<ffffffff8145ee6d>] uuid_endio+0x3d/0x50 > [ 6758.314093] [<ffffffff81173a9d>] bio_endio+0x1d/0x40 > [ 6758.314093] [<ffffffff812d15d3>] req_bio_endio+0x83/0xc0 > [ 6758.314093] [<ffffffff812d4f71>] blk_update_request+0x101/0x5c0 > [ 6758.314093] [<ffffffff812d51a2>] ? blk_update_request+0x332/0x5c0 > [ 6758.314093] [<ffffffff812d5461>] blk_update_bidi_request+0x31/0x90 > [ 6758.314093] [<ffffffff812d54ec>] blk_end_bidi_request+0x2c/0x80 > [ 6758.314093] [<ffffffff812d5580>] blk_end_request+0x10/0x20 > [ 6758.314093] [<ffffffff81471b7c>] scsi_io_completion+0x9c/0x5f0 > [ 6758.314093] [<ffffffff81468940>] scsi_finish_command+0xb0/0xe0 > [ 6758.314093] [<ffffffff81471965>] scsi_softirq_done+0xa5/0x140 > [ 6758.314093] [<ffffffff812db70b>] blk_done_softirq+0x7b/0x90 > [ 6758.314093] [<ffffffff8104fc65>] __do_softirq+0xc5/0x3a0 > [ 6758.314093] [<ffffffff817f6dac>] call_softirq+0x1c/0x30 > [ 6758.314093] [<ffffffff8100419d>] do_softirq+0x8d/0xc0 > [ 6758.314093] [<ffffffff8105027e>] irq_exit+0xae/0xe0 > [ 6758.314093] [<ffffffff817f74b3>] do_IRQ+0x63/0xe0 > [ 6758.314093] [<ffffffff817ecc30>] common_interrupt+0x70/0x70 > [ 6758.314093] <EOI> > [ 6758.314093] [<ffffffff8100a1f6>] ? mwait_idle+0xb6/0x470 > [ 6758.314093] [<ffffffff8100a1ed>] ? mwait_idle+0xad/0x470 > [ 6758.314093] [<ffffffff810011df>] cpu_idle+0x8f/0xd0 > [ 6758.314093] [<ffffffff817da107>] start_secondary+0x1be/0x1c2 > [ 6758.314093] Code: 00 48 8b 50 48 83 e2 08 0f 85 9c 00 00 00 48 8b 50 48 > 83 e2 10 0f 85 8d 00 00 00 48 83 78 18 00 75 46 48 8b 40 40 48 85 c0 74 24 > [ 6758.314093] 8b 50 48 48 c1 ea 04 89 d1 89 f2 83 e1 01 f0 0f c1 50 4c > 83 > [ 6758.314093] RIP [<ffffffff8146625b>] closure_put+0x5b/0xe0 > [ 6758.314093] RSP <ffff88013fc83c60> > > Thanks > Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bcache 2012-03-14 17:24 ` Kent Overstreet @ 2012-03-14 22:01 ` Mike Snitzer 2012-03-14 22:09 ` [Lsf-pc] Bcache Williams, Dan J 2012-03-15 17:27 ` Bcache Kent Overstreet 2012-03-15 19:43 ` [Lsf-pc] [Topic] Bcache Vivek Goyal 1 sibling, 2 replies; 21+ messages in thread From: Mike Snitzer @ 2012-03-14 22:01 UTC (permalink / raw) To: Kent Overstreet Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel, Christoph Hellwig On Wed, Mar 14 2012 at 1:24pm -0400, Kent Overstreet <koverstreet@google.com> wrote: > On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > > On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote: > >> I'm already registered to attend, but would it be too late in the > >> process to give a talk? I'd like to give a short talk about bcache, what > >> it does and where it's going (more than just caching). > > > > [CCing dm-devel list] > > > > I am curious if you considered writing a device mapper driver for this? If > > yes, why that is not a good choice. It seems to be stacked device and device > > mapper should be good at that. All the configuration through sysfs seems > > little odd to me. > > Everyone asks this. Yeah, I considered it, I tried to make it work for > a couple weeks but it was far more trouble than it was worth. I'm not > opposed to someone else working on it but I'm not going to spend any > more time on it myself. I really wish you'd have worked with dm-devel more persistently, you did post twice to dm-devel (at an awkward time of year but whatever): http://www.redhat.com/archives/dm-devel/2010-December/msg00204.html http://www.redhat.com/archives/dm-devel/2010-December/msg00232.html But somewhere along the way you privately gave up on DM... and have since repeatedly talked critically of DM. Yet you have _never_ substantiated _why_ DM is "far more trouble than it was worth", etc. Reading between the lines on a previous LKML bcache threads where the questions of "why not use DM or MD?" came up: https://lkml.org/lkml/2011/9/11/117 https://lkml.org/lkml/2011/9/15/376 It seemed your primary focus was on getting into the details of the SSD caching ASAP -- because that is what interested you. Both DM and MD have a learning curve, maybe it was too frustrating and/or distracting to tackle. Anyway, I don't fault you for initially doing your own thing for a virtual device framework -- it allowed you to get to the stuff you really cared about sooner. That said, it is frustrating that you are content to continue doing your own thing because I'm now tasked with implementing a DM target for caching/HSM, as I touched on here: http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html I have little upfront incentive to make use of bcache because it doesn't use DM. Not to mention DM already has its own b-tree implementation (granted bcache is much more than it's b+tree). I obviously won't ignore bcache (or flashcache) but I'm setting out to build on DM infrastructure as effectively as possible. My initial take on how to factor things is to split into 2 DM targets: "hsm-cache" and "hsm". These targets reuse the infrastructure that was recently introduced for dm-thinp: drivers/md/persistent-data/ and dm-bufio. Like the "thin-pool" target, the "hsm-cache" target provides a central resource (cache) that "hsm" target device(s) will attach to. The "hsm-cache" target, like thin-pool, will have a data and metadata device, constructor: hsm-cache <metadata dev> <data dev> <data block size (sectors)> The "hsm" target will pair an hsm-cache device with a backing device, constructor: hsm <dev_id> <cache_dev> <backing_dev> The same hsm-cache device may be used by multiple hsm devices. So I mean this is the same high-level architecture as bcache (shared SSD cache). Where things get interesting is the mechanics of the caching and the metadata. I'm coming to terms with the metadata now (based on desired features and cache replacement policies), once it is nailed down I expect things to fall into place pretty quickly. I'm very early in the design but hope to have an initial functional version of the code together in time for LSF -- ~2 weeks may be too ambitious but it's my goal (could be more doable if I confine the initial code to writethrough with LRU). Mike ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] Bcache 2012-03-14 22:01 ` Bcache Mike Snitzer @ 2012-03-14 22:09 ` Williams, Dan J 2012-03-15 17:27 ` Bcache Kent Overstreet 1 sibling, 0 replies; 21+ messages in thread From: Williams, Dan J @ 2012-03-14 22:09 UTC (permalink / raw) To: Mike Snitzer Cc: Kent Overstreet, linux-scsi, Christoph Hellwig, dm-devel, nauman, lsf-pc, Vivek Goyal On Wed, Mar 14, 2012 at 3:01 PM, Mike Snitzer <snitzer@redhat.com> wrote: > I'm very early in the design but hope to have an initial functional > version of the code together in time for LSF -- ~2 weeks may be too > ambitious but it's my goal (could be more doable if I confine the > initial code to writethrough with LRU). I'm hoping caching ends up being as successful as the raid456 unification where we can have a dm or md interface in front of some common infrastructure. The inertia for md is to keep it close to all the recent software raid advancements, the inertia for dm is also clear, the inertia for something brand new... not very clear. -- Dan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bcache 2012-03-14 22:01 ` Bcache Mike Snitzer 2012-03-14 22:09 ` [Lsf-pc] Bcache Williams, Dan J @ 2012-03-15 17:27 ` Kent Overstreet 2012-03-15 20:17 ` Bcache Mike Snitzer 1 sibling, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2012-03-15 17:27 UTC (permalink / raw) To: Mike Snitzer Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel, Christoph Hellwig On Wed, Mar 14, 2012 at 06:01:50PM -0400, Mike Snitzer wrote: > On Wed, Mar 14 2012 at 1:24pm -0400, > Kent Overstreet <koverstreet@google.com> wrote: > > > On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > > > On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote: > > >> I'm already registered to attend, but would it be too late in the > > >> process to give a talk? I'd like to give a short talk about bcache, what > > >> it does and where it's going (more than just caching). > > > > > > [CCing dm-devel list] > > > > > > I am curious if you considered writing a device mapper driver for this? If > > > yes, why that is not a good choice. It seems to be stacked device and device > > > mapper should be good at that. All the configuration through sysfs seems > > > little odd to me. > > > > Everyone asks this. Yeah, I considered it, I tried to make it work for > > a couple weeks but it was far more trouble than it was worth. I'm not > > opposed to someone else working on it but I'm not going to spend any > > more time on it myself. > > I really wish you'd have worked with dm-devel more persistently, you did > post twice to dm-devel (at an awkward time of year but whatever): > http://www.redhat.com/archives/dm-devel/2010-December/msg00204.html > http://www.redhat.com/archives/dm-devel/2010-December/msg00232.html I spent quite a bit of time talking to Heinz Mauelshagen and someone else who's name escapes me; I also spent around two weeks working on bcache-dm code before I decided it was unworkable. And bcache is two years old now, if the dm guys wanted bcache to use dm there's been ample opportunity; nobody's been interested enough to do anything about it. I'm still not against a bcache-dm interface, if someone else can make it work - I just really have no interest or reason to write the code myself. It works fine as it is. > But somewhere along the way you privately gave up on DM... and have > since repeatedly talked critically of DM. Yet you have _never_ > substantiated _why_ DM is "far more trouble than it was worth", etc. I have, can't blame you for missing it but honestly this comes up constantly; people asking me (often accusitavely) why bcache doesn't use dm and it gets really old. I've got better things to do. Frankly, my biggest complaint with the DM is that the code is _terrible_ and very poorly documented. It's an inflexible framework that tries to combine a bunch of things that should be orthogonal. My other complaints all stem from that; it became very clear that it wasn't designed for creating a block device from the kernel, which is kind of necessary (at least the only sane way of doing it, IMO) when metadata is managed by the kernel (and the kernel has to manage most metadata for bcache). > Reading between the lines on a previous LKML bcache threads where the > questions of "why not use DM or MD?" came up: > https://lkml.org/lkml/2011/9/11/117 > https://lkml.org/lkml/2011/9/15/376 > > It seemed your primary focus was on getting into the details of the SSD > caching ASAP -- because that is what interested you. Both DM and MD > have a learning curve, maybe it was too frustrating and/or > distracting to tackle. > > Anyway, I don't fault you for initially doing your own thing for a > virtual device framework -- it allowed you to get to the stuff you > really cared about sooner. > > That said, it is frustrating that you are content to continue doing your > own thing because I'm now tasked with implementing a DM target for > caching/HSM, as I touched on here: > http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html Kind of presumptuous, don't you think? I've nothing at all against collaborating, or you or other dm devs adapting bcache code - I'd help out with that! But I'm just not going to write my code a certain way just to suit you. > I have little upfront incentive to make use of bcache because it doesn't > use DM. Not to mention DM already has its own b-tree implementation > (granted bcache is much more than it's b+tree). I obviously won't > ignore bcache (or flashcache) but I'm setting out to build on DM > infrastructure as effectively as possible. Oh, darn. > My initial take on how to factor things is to split into 2 DM targets: > "hsm-cache" and "hsm". These targets reuse the infrastructure that was > recently introduced for dm-thinp: drivers/md/persistent-data/ and > dm-bufio. > > Like the "thin-pool" target, the "hsm-cache" target provides a central > resource (cache) that "hsm" target device(s) will attach to. The > "hsm-cache" target, like thin-pool, will have a data and metadata > device, constructor: > hsm-cache <metadata dev> <data dev> <data block size (sectors)> > > The "hsm" target will pair an hsm-cache device with a backing device, > constructor: > hsm <dev_id> <cache_dev> <backing_dev> > > The same hsm-cache device may be used by multiple hsm devices. So I > mean this is the same high-level architecture as bcache (shared SSD > cache). > > Where things get interesting is the mechanics of the caching and the > metadata. I'm coming to terms with the metadata now (based on desired > features and cache replacement policies), once it is nailed down I > expect things to fall into place pretty quickly. > > I'm very early in the design but hope to have an initial functional > version of the code together in time for LSF -- ~2 weeks may be too > ambitious but it's my goal (could be more doable if I confine the > initial code to writethrough with LRU). Look forward to seeing the benchmarks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bcache 2012-03-15 17:27 ` Bcache Kent Overstreet @ 2012-03-15 20:17 ` Mike Snitzer 2012-03-15 22:59 ` Bcache Kent Overstreet 0 siblings, 1 reply; 21+ messages in thread From: Mike Snitzer @ 2012-03-15 20:17 UTC (permalink / raw) To: Kent Overstreet Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel, Christoph Hellwig On Thu, Mar 15 2012 at 1:27pm -0400, Kent Overstreet <koverstreet@google.com> wrote: > On Wed, Mar 14, 2012 at 06:01:50PM -0400, Mike Snitzer wrote: > > I really wish you'd have worked with dm-devel more persistently, you did > > post twice to dm-devel (at an awkward time of year but whatever): > > http://www.redhat.com/archives/dm-devel/2010-December/msg00204.html > > http://www.redhat.com/archives/dm-devel/2010-December/msg00232.html > > I spent quite a bit of time talking to Heinz Mauelshagen and someone > else who's name escapes me; I also spent around two weeks working on > bcache-dm code before I decided it was unworkable. > > And bcache is two years old now, if the dm guys wanted bcache to use dm > there's been ample opportunity; nobody's been interested enough to do > anything about it. I'm still not against a bcache-dm interface, if > someone else can make it work - I just really have no interest or reason > to write the code myself. It works fine as it is. Your interest should be in getting the hard work you've put into bcache upstream. That's unlikely to happen until you soften on your reluctance to embrace existing appropriate kernel interfaces. > Frankly, my biggest complaint with the DM is that the code is _terrible_ > and very poorly documented. It's an inflexible framework that tries to > combine a bunch of things that should be orthogonal. My other complaints > all stem from that; it became very clear that it wasn't designed for > creating a block device from the kernel, which is kind of necessary (at > least the only sane way of doing it, IMO) when metadata is managed by > the kernel (and the kernel has to manage most metadata for bcache). Baseless and unspecific assertions don't help your cause -- dm-thinp disproves your unconvincing position (manages it's metadata in kernel, etc). Seems pretty clear you could care less about _really_ working together -- maybe it is just this DM/kernel interface thing gets you down. Regardless, the burden is on me (and all developers who have a desire to see a caching/HSM driver get upstream) to evaluate bcache. That process has started -- hopefully it'll be as simple as: 1) put a DM target wrapper in place of your sysfs interface. 2) switch/port bcache's btree over to drivers/md/persistent-data/ 3) dm-bcache FTW One could dream. The little bit I've looked at bcache it already seems unrealistic; for starters you have the btree wired directly to bio submission. drivers/md/persistent-data/ offers a layered approach, dm-block-manager.c brokers the IO submission (via dm-bufio) so the management of the btree(s) doesn't need to be concerned with actual IO. bcache is _very_ tightly coupled with your btree implementation. > > Reading between the lines on a previous LKML bcache threads where the > > questions of "why not use DM or MD?" came up: > > https://lkml.org/lkml/2011/9/11/117 > > https://lkml.org/lkml/2011/9/15/376 > > > > It seemed your primary focus was on getting into the details of the SSD > > caching ASAP -- because that is what interested you. Both DM and MD > > have a learning curve, maybe it was too frustrating and/or > > distracting to tackle. > > > > Anyway, I don't fault you for initially doing your own thing for a > > virtual device framework -- it allowed you to get to the stuff you > > really cared about sooner. > > > > That said, it is frustrating that you are content to continue doing your > > own thing because I'm now tasked with implementing a DM target for > > caching/HSM, as I touched on here: > > http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html > > Kind of presumptuous, don't you think? Not really, considering what I'm responding to at the moment ;) > I've nothing at all against collaborating, or you or other dm devs > adapting bcache code - I'd help out with that! OK. > But I'm just not going to write my code a certain way just to suit you. upstream kumbaya: more cooperative eyes on the problem, working to hook into established interfaces, will produce a solution that is worthy of upstream inclusion. > Look forward to seeing the benchmarks. Speaking of which, weren't you saying you'd show bcache benchmarks in a previous LKML thread? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bcache 2012-03-15 20:17 ` Bcache Mike Snitzer @ 2012-03-15 22:59 ` Kent Overstreet 2012-03-16 1:45 ` Bcache Mike Snitzer 0 siblings, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2012-03-15 22:59 UTC (permalink / raw) To: Mike Snitzer Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel, Christoph Hellwig On Thu, Mar 15, 2012 at 04:17:32PM -0400, Mike Snitzer wrote: > Your interest should be in getting the hard work you've put into bcache > upstream. That's unlikely to happen until you soften on your reluctance > to embrace existing appropriate kernel interfaces. I don't really care what you think my priorities should be. I write code first and foremost for myself, and the one thing I care about is good code. I'd love to have bcache in mainline, seeing more use and getting more improvements - but if that's contingent on making it work through dm, sorry, not interested. If you want to convince me that dm is the right way to go you'll have much better luck with technical arguments. Besides which, I'm planning on (and very soon going to be working on) growing bcache down into an FTL and up into the bottom half of a filesystem. As far as I can tell integrating with dm would only get in the way of that. It's actually not as crazy as it sounds - the basic idea is to make the index the central abstraction, and allocation policies sit conceptually underneath and are abstracted out - and sitting top, some filesystem code (and possibly other things) uses the existing code as if it were some kind of object storage like thing; the existing bcache code maps inode number:offset -> lba instead of cached device:offset. I'll explain more at LSF, but eventually it ought to look vaguely like btrfs/zfs but with better abstraction and better performance. > > Frankly, my biggest complaint with the DM is that the code is _terrible_ > > and very poorly documented. It's an inflexible framework that tries to > > combine a bunch of things that should be orthogonal. My other complaints > > all stem from that; it became very clear that it wasn't designed for > > creating a block device from the kernel, which is kind of necessary (at > > least the only sane way of doing it, IMO) when metadata is managed by > > the kernel (and the kernel has to manage most metadata for bcache). > > Baseless and unspecific assertions don't help your cause -- dm-thinp > disproves your unconvincing position (manages it's metadata in kernel, > etc). I'm not the only one who's read the dm code and found it lacking - and anyways, I'm not really out to convince anyone. > Seems pretty clear you could care less about _really_ working together > -- maybe it is just this DM/kernel interface thing gets you down. Dude, I reached out to dm developers ages ago. Maybe if you guys had shown some interest we wouldn't be having this conversation now. This finger pointing is ridiculous and getting us nowhere. > Regardless, the burden is on me (and all developers who have a desire to > see a caching/HSM driver get upstream) to evaluate bcache. That process > has started -- hopefully it'll be as simple as: > > 1) put a DM target wrapper in place of your sysfs interface. > 2) switch/port bcache's btree over to drivers/md/persistent-data/ > 3) dm-bcache FTW Replacing bcache's persistent metadata code? Hah. That's the central part of the design! Is this the way new filesystems are evaluated? No, it's not. What makes you more special than ext4? > One could dream. > > The little bit I've looked at bcache it already seems unrealistic; for > starters you have the btree wired directly to bio submission. > drivers/md/persistent-data/ offers a layered approach, > dm-block-manager.c brokers the IO submission (via dm-bufio) so the > management of the btree(s) doesn't need to be concerned with actual IO. > > bcache is _very_ tightly coupled with your btree implementation. Yes, it is! It really has to be, efficiently allocating buckets and invalidating cached data relies on specific details of the btree implementation. The btree is _central_ to bcache, ignoring that the rest of the code isn't all that interesting. > > > That said, it is frustrating that you are content to continue doing your > > > own thing because I'm now tasked with implementing a DM target for > > > caching/HSM, as I touched on here: > > > http://www.redhat.com/archives/linux-lvm/2012-March/msg00007.html > > > > Kind of presumptuous, don't you think? > > Not really, considering what I'm responding to at the moment ;) Maybe you should consider how you word things... > > I've nothing at all against collaborating, or you or other dm devs > > adapting bcache code - I'd help out with that! > > OK. > > > But I'm just not going to write my code a certain way just to suit you. > > upstream kumbaya: more cooperative eyes on the problem, working to hook > into established interfaces, will produce a solution that is worthy of > upstream inclusion. Let me be clear: All I care about is the best solution. I'm more than happy to work with other people to achieve that, but I don't give a damn about anything else. > > Look forward to seeing the benchmarks. > > Speaking of which, weren't you saying you'd show bcache benchmarks in a > previous LKML thread? Yeah I did, but as usual I got distracted. I'm travelling for the next three weeks, but maybe I can get someone else to get some numbers that we can publish... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Bcache 2012-03-15 22:59 ` Bcache Kent Overstreet @ 2012-03-16 1:45 ` Mike Snitzer 0 siblings, 0 replies; 21+ messages in thread From: Mike Snitzer @ 2012-03-16 1:45 UTC (permalink / raw) To: Kent Overstreet Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel, Christoph Hellwig On Thu, Mar 15 2012 at 6:59pm -0400, Kent Overstreet <koverstreet@google.com> wrote: > On Thu, Mar 15, 2012 at 04:17:32PM -0400, Mike Snitzer wrote: > > Your interest should be in getting the hard work you've put into bcache > > upstream. That's unlikely to happen until you soften on your reluctance > > to embrace existing appropriate kernel interfaces. > > I don't really care what you think my priorities should be. I write code > first and foremost for myself, and the one thing I care about is good > code. > > I'd love to have bcache in mainline, seeing more use and getting more > improvements - but if that's contingent on making it work through dm, > sorry, not interested. > > If you want to convince me that dm is the right way to go you'll have > much better luck with technical arguments. We have quite a lot of code that illustrates how to implement DM targets. DM isn't forcing undue or cumbersome constraints that prevent it's use for complex targets with in-kernel metadata -- again dm-thinp proves this. It is your burden to even begin to substantiate _why_ both DM and MD are inadequate frameworks for virtual block device drivers. > > Baseless and unspecific assertions don't help your cause -- dm-thinp > > disproves your unconvincing position (manages it's metadata in kernel, > > etc). > > I'm not the only one who's read the dm code and found it lacking - and > anyways, I'm not really out to convince anyone. Like other kernel code, DM is approachable for those who are willing to put the time in to understand it. Your hand-waving (and now proxy) critiques leave us nothing to work with. > > > Kind of presumptuous, don't you think? > > > > Not really, considering what I'm responding to at the moment ;) > > Maybe you should consider how you word things... Say what? Nice projection. Luckily the thread is public for all to see. I initially thought Christoph's feedback in this thread was harsh; now it seems eerily prophetic. Lets stop wasting our time on this thread. Maybe we can be more constructive in the future. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 17:24 ` Kent Overstreet 2012-03-14 22:01 ` Bcache Mike Snitzer @ 2012-03-15 19:43 ` Vivek Goyal 2012-03-15 23:46 ` Kent Overstreet 1 sibling, 1 reply; 21+ messages in thread From: Vivek Goyal @ 2012-03-15 19:43 UTC (permalink / raw) To: Kent Overstreet; +Cc: lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 01:24:08PM -0400, Kent Overstreet wrote: [..] > > Can you post the full log? There was a bug where if it encountered an > error during registration, it wouldn't wait for a uuid read or write > before tearing everything down - that's what your backtrace looks like > to me. > > You could try the bcache-3.2-dev branch, too. I have a newer branch > with a ton of bugfixes but I'm waiting until it's seen more testing > before I post it. Faced the same issue on bcache-3.2-dev branch too. login: [ 167.532932] bio: create slab <bio-1> at 1 [ 167.539071] bcache: invalidating existing data [ 167.547604] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC [ 167.548573] CPU 2 [ 167.548573] Modules linked in: floppy [last unloaded: scsi_wait_scan] [ 167.548573] [ 167.548573] Pid: 0, comm: swapper/2 Not tainted 3.2.0-bcache+ #4 Hewlett-Packard HP xw6600 Workstation/0A9Ch [ 167.548573] RIP: 0010:[<ffffffff8144d6fe>] [<ffffffff8144d6fe>] closure_put+0xe/0x20 [ 167.548573] RSP: 0018:ffff88013fc83c60 EFLAGS: 00010246 [ 167.548573] RAX: 0000000000000000 RBX: ffff8801385b04a0 RCX: 0000000000000000 [ 167.548573] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 6b6b6b6b6b6b6b6b [ 167.548573] RBP: ffff88013fc83c60 R08: 0000000000000000 R09: 0000000000000001 [ 167.548573] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 167.548573] R13: ffff880137719580 R14: 0000000000080000 R15: 0000000000000000 [ 167.548573] FS: 0000000000000000(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000 [ 167.548573] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 167.548573] CR2: 00007f6e84f70240 CR3: 000000013707d000 CR4: 00000000000006e0 [ 167.548573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 167.548573] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 167.548573] Process swapper/2 (pid: 0, threadinfo ffff88013a454000, task ffff88013a458000) [ 167.548573] Stack: [ 167.548573] ffff88013fc83c80 ffffffff814448c6 ffffffff00000000 ffff8801385b04a0 [ 167.548573] ffff88013fc83c90 ffffffff8117ae8d ffff88013fc83cc0 ffffffff812e2273 [ 167.548573] ffff88013a454000 0000000000000000 ffff8801385b04a0 0000000000080000 [ 167.548573] Call Trace: [ 167.548573] <IRQ> [ 167.548573] [<ffffffff814448c6>] uuid_endio+0x36/0x40 [ 167.548573] [<ffffffff8117ae8d>] bio_endio+0x1d/0x40 [ 167.548573] [<ffffffff812e2273>] req_bio_endio+0x83/0xc0 [ 167.548573] [<ffffffff812e53e1>] blk_update_request+0x101/0x5c0 [ 167.548573] [<ffffffff812e5612>] ? blk_update_request+0x332/0x5c0 [ 167.548573] [<ffffffff812e58d1>] blk_update_bidi_request+0x31/0x90 [ 167.548573] [<ffffffff812e595c>] blk_end_bidi_request+0x2c/0x80 [ 167.548573] [<ffffffff812e59f0>] blk_end_request+0x10/0x20 [ 167.548573] [<ffffffff81458fdc>] scsi_io_completion+0x9c/0x5f0 [ 167.548573] [<ffffffff8144fcd0>] scsi_finish_command+0xb0/0xe0 [ 167.548573] [<ffffffff81458dc5>] scsi_softirq_done+0xa5/0x140 [ 167.548573] [<ffffffff812ec55b>] blk_done_softirq+0x7b/0x90 [ 167.548573] [<ffffffff810512ae>] __do_softirq+0xce/0x3c0 [ 167.548573] [<ffffffff817e84ac>] call_softirq+0x1c/0x30 [ 167.548573] [<ffffffff8100417d>] do_softirq+0x8d/0xc0 [ 167.548573] [<ffffffff810518de>] irq_exit+0xae/0xe0 [ 167.548573] [<ffffffff817e8bb3>] do_IRQ+0x63/0xe0 [ 167.548573] [<ffffffff817de1f0>] common_interrupt+0x70/0x70 [ 167.548573] <EOI> [ 167.548573] [<ffffffff8100a5f6>] ? mwait_idle+0xb6/0x490 [ 167.548573] [<ffffffff8100a5ed>] ? mwait_idle+0xad/0x490 [ 167.548573] [<ffffffff810011e6>] cpu_idle+0x96/0xe0 [ 167.548573] [<ffffffff817cb475>] start_secondary+0x1be/0x1c2 [ 167.548573] Code: ee 01 00 00 10 e8 03 ff ff ff 48 85 db 75 de 5b 41 5c 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 be ff ff ff ff <f0> 0f c1 77 48 83 ee 01 e8 d5 fe ff ff 5d c3 0f 1f 00 55 48 89 [ 167.548573] RIP [<ffffffff8144d6fe>] closure_put+0xe/0x20 [ 167.548573] RSP <ffff88013fc83c60> Thanks Vivek ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-15 19:43 ` [Lsf-pc] [Topic] Bcache Vivek Goyal @ 2012-03-15 23:46 ` Kent Overstreet 0 siblings, 0 replies; 21+ messages in thread From: Kent Overstreet @ 2012-03-15 23:46 UTC (permalink / raw) To: Vivek Goyal; +Cc: lsf-pc, nauman, linux-scsi, dm-devel On Thu, Mar 15, 2012 at 03:43:36PM -0400, Vivek Goyal wrote: > On Wed, Mar 14, 2012 at 01:24:08PM -0400, Kent Overstreet wrote: > > [..] > > > > Can you post the full log? There was a bug where if it encountered an > > error during registration, it wouldn't wait for a uuid read or write > > before tearing everything down - that's what your backtrace looks like > > to me. > > > > You could try the bcache-3.2-dev branch, too. I have a newer branch > > with a ton of bugfixes but I'm waiting until it's seen more testing > > before I post it. > > Faced the same issue on bcache-3.2-dev branch too. Shoot. Well, I know I fixed that bug (well, a bug with the same symptoms), and I guess that branch was kind of old. I just updated the bcache-3.2-dev branch to the newest vaguely possibly tested code; the 3.2 version is only build tested (we're still developing on 2.6.34). > > login: [ 167.532932] bio: create slab <bio-1> at 1 > [ 167.539071] bcache: invalidating existing data > [ 167.547604] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC > [ 167.548573] CPU 2 > [ 167.548573] Modules linked in: floppy [last unloaded: scsi_wait_scan] > [ 167.548573] > [ 167.548573] Pid: 0, comm: swapper/2 Not tainted 3.2.0-bcache+ #4 > Hewlett-Packard HP xw6600 Workstation/0A9Ch > [ 167.548573] RIP: 0010:[<ffffffff8144d6fe>] [<ffffffff8144d6fe>] > closure_put+0xe/0x20 > [ 167.548573] RSP: 0018:ffff88013fc83c60 EFLAGS: 00010246 > [ 167.548573] RAX: 0000000000000000 RBX: ffff8801385b04a0 RCX: > 0000000000000000 > [ 167.548573] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: > 6b6b6b6b6b6b6b6b > [ 167.548573] RBP: ffff88013fc83c60 R08: 0000000000000000 R09: > 0000000000000001 > [ 167.548573] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000000 > [ 167.548573] R13: ffff880137719580 R14: 0000000000080000 R15: > 0000000000000000 > [ 167.548573] FS: 0000000000000000(0000) GS:ffff88013fc80000(0000) > knlGS:0000000000000000 > [ 167.548573] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 167.548573] CR2: 00007f6e84f70240 CR3: 000000013707d000 CR4: > 00000000000006e0 > [ 167.548573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 167.548573] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 167.548573] Process swapper/2 (pid: 0, threadinfo ffff88013a454000, > task ffff88013a458000) > [ 167.548573] Stack: > [ 167.548573] ffff88013fc83c80 ffffffff814448c6 ffffffff00000000 > ffff8801385b04a0 > [ 167.548573] ffff88013fc83c90 ffffffff8117ae8d ffff88013fc83cc0 > ffffffff812e2273 > [ 167.548573] ffff88013a454000 0000000000000000 ffff8801385b04a0 > 0000000000080000 > [ 167.548573] Call Trace: > [ 167.548573] <IRQ> > [ 167.548573] [<ffffffff814448c6>] uuid_endio+0x36/0x40 > [ 167.548573] [<ffffffff8117ae8d>] bio_endio+0x1d/0x40 > [ 167.548573] [<ffffffff812e2273>] req_bio_endio+0x83/0xc0 > [ 167.548573] [<ffffffff812e53e1>] blk_update_request+0x101/0x5c0 > [ 167.548573] [<ffffffff812e5612>] ? blk_update_request+0x332/0x5c0 > [ 167.548573] [<ffffffff812e58d1>] blk_update_bidi_request+0x31/0x90 > [ 167.548573] [<ffffffff812e595c>] blk_end_bidi_request+0x2c/0x80 > [ 167.548573] [<ffffffff812e59f0>] blk_end_request+0x10/0x20 > [ 167.548573] [<ffffffff81458fdc>] scsi_io_completion+0x9c/0x5f0 > [ 167.548573] [<ffffffff8144fcd0>] scsi_finish_command+0xb0/0xe0 > [ 167.548573] [<ffffffff81458dc5>] scsi_softirq_done+0xa5/0x140 > [ 167.548573] [<ffffffff812ec55b>] blk_done_softirq+0x7b/0x90 > [ 167.548573] [<ffffffff810512ae>] __do_softirq+0xce/0x3c0 > [ 167.548573] [<ffffffff817e84ac>] call_softirq+0x1c/0x30 > [ 167.548573] [<ffffffff8100417d>] do_softirq+0x8d/0xc0 > [ 167.548573] [<ffffffff810518de>] irq_exit+0xae/0xe0 > [ 167.548573] [<ffffffff817e8bb3>] do_IRQ+0x63/0xe0 > [ 167.548573] [<ffffffff817de1f0>] common_interrupt+0x70/0x70 > [ 167.548573] <EOI> > [ 167.548573] [<ffffffff8100a5f6>] ? mwait_idle+0xb6/0x490 > [ 167.548573] [<ffffffff8100a5ed>] ? mwait_idle+0xad/0x490 > [ 167.548573] [<ffffffff810011e6>] cpu_idle+0x96/0xe0 > [ 167.548573] [<ffffffff817cb475>] start_secondary+0x1be/0x1c2 > [ 167.548573] Code: ee 01 00 00 10 e8 03 ff ff ff 48 85 db 75 de 5b 41 5c > 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 be ff ff ff ff > <f0> 0f c1 77 48 83 ee 01 e8 d5 fe ff ff 5d c3 0f 1f 00 55 48 89 > [ 167.548573] RIP [<ffffffff8144d6fe>] closure_put+0xe/0x20 > [ 167.548573] RSP <ffff88013fc83c60> > > Thanks > Vivek ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal 2012-03-14 17:24 ` Kent Overstreet @ 2012-03-14 18:12 ` chetan loke 2012-03-14 18:17 ` Kent Overstreet 1 sibling, 1 reply; 21+ messages in thread From: chetan loke @ 2012-03-14 18:12 UTC (permalink / raw) To: Vivek Goyal; +Cc: Kent Overstreet, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 11:53 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Wed, Mar 14, 2012 at 09:32:28AM -0400, Kent Overstreet wrote: >> I'm already registered to attend, but would it be too late in the >> process to give a talk? I'd like to give a short talk about bcache, what >> it does and where it's going (more than just caching). > > [CCing dm-devel list] > > I am curious if you considered writing a device mapper driver for this? If > yes, why that is not a good choice. It seems to be stacked device and device > mapper should be good at that. All the configuration through sysfs seems > little odd to me. I'm not a dm guru but a quick scan at flash-cache seems like it does what you are saying. Now, if performance isn't acceptable then hashes can be replaced with trees and what-not. Also no one would need to re-invent the stacking mechanism. I saw thin support(atleast documented) for dm. Plus, no matter what cache you come up with you may have to persist/store the meta-data associated with it. And dm seems like the right place to abstract that. > Vivek Chetan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:12 ` chetan loke @ 2012-03-14 18:17 ` Kent Overstreet 2012-03-14 18:33 ` chetan loke 0 siblings, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2012-03-14 18:17 UTC (permalink / raw) To: chetan loke; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 2:12 PM, chetan loke <loke.chetan@gmail.com> wrote: > I'm not a dm guru but a quick scan at flash-cache seems like it does > what you are saying. Now, if performance isn't acceptable then hashes > can be replaced with trees and what-not. Also no one would need to > re-invent the stacking mechanism. I saw thin support(atleast > documented) for dm. Plus, no matter what cache you come up with you > may have to persist/store the meta-data associated with it. And dm > seems like the right place to abstract that. Bcache kills flash cache on performance - bcache can do around a million iops on 4k random reads, and beats it on real world applications and hardware too (i.e. mysql). I'm not aware of any real features I'm missing out on by not using dm... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:17 ` Kent Overstreet @ 2012-03-14 18:33 ` chetan loke 2012-03-14 18:41 ` Kent Overstreet 2012-03-14 18:54 ` Ted Ts'o 0 siblings, 2 replies; 21+ messages in thread From: chetan loke @ 2012-03-14 18:33 UTC (permalink / raw) To: Kent Overstreet; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 2:17 PM, Kent Overstreet <koverstreet@google.com> wrote: > On Wed, Mar 14, 2012 at 2:12 PM, chetan loke <loke.chetan@gmail.com> wrote: >> I'm not a dm guru but a quick scan at flash-cache seems like it does >> what you are saying. Now, if performance isn't acceptable then hashes >> can be replaced with trees and what-not. Also no one would need to >> re-invent the stacking mechanism. I saw thin support(atleast >> documented) for dm. Plus, no matter what cache you come up with you >> may have to persist/store the meta-data associated with it. And dm >> seems like the right place to abstract that. > > Bcache kills flash cache on performance - bcache can do around a > million iops on 4k random reads, and beats it on real world > applications and hardware too (i.e. mysql). > Don't get too carried away with the perf numbers. re-read what I said: "if performance isn't acceptable then hashes can be replaced with trees and what-not". > I'm not aware of any real features I'm missing out on by not using dm... But you are not explaining why dm is not the right stack. Just because it crashed when you tried doesn't mean it's not the right place. flash-cache works, doesn't it? flash-cache's limitation is because it's a dm-target or because it is using hashing or something else? There are start-ups who are doing quite great with SSD-cache+dm. So please stop kidding yourself. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:33 ` chetan loke @ 2012-03-14 18:41 ` Kent Overstreet 2012-03-14 18:47 ` Christoph Hellwig 2012-03-14 19:04 ` chetan loke 2012-03-14 18:54 ` Ted Ts'o 1 sibling, 2 replies; 21+ messages in thread From: Kent Overstreet @ 2012-03-14 18:41 UTC (permalink / raw) To: chetan loke; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 2:33 PM, chetan loke <loke.chetan@gmail.com> wrote: > On Wed, Mar 14, 2012 at 2:17 PM, Kent Overstreet <koverstreet@google.com> wrote: >> On Wed, Mar 14, 2012 at 2:12 PM, chetan loke <loke.chetan@gmail.com> wrote: >>> I'm not a dm guru but a quick scan at flash-cache seems like it does >>> what you are saying. Now, if performance isn't acceptable then hashes >>> can be replaced with trees and what-not. Also no one would need to >>> re-invent the stacking mechanism. I saw thin support(atleast >>> documented) for dm. Plus, no matter what cache you come up with you >>> may have to persist/store the meta-data associated with it. And dm >>> seems like the right place to abstract that. >> >> Bcache kills flash cache on performance - bcache can do around a >> million iops on 4k random reads, and beats it on real world >> applications and hardware too (i.e. mysql). >> > > Don't get too carried away with the perf numbers. re-read what I said: > "if performance isn't acceptable then hashes can be replaced with > trees and what-not". Nobody's stopping you. >> I'm not aware of any real features I'm missing out on by not using dm... > > But you are not explaining why dm is not the right stack. Just because > it crashed when you tried doesn't mean it's not the right place. > flash-cache works, doesn't it? flash-cache's limitation is because > it's a dm-target or because it is using hashing or something else? > There are start-ups who are doing quite great with SSD-cache+dm. So > please stop kidding yourself. If you want me to implement bcache differently, shouldn't you explain why? I'm not sure why I _have_ to justify my decisions to you. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:41 ` Kent Overstreet @ 2012-03-14 18:47 ` Christoph Hellwig 2012-03-14 19:04 ` chetan loke 1 sibling, 0 replies; 21+ messages in thread From: Christoph Hellwig @ 2012-03-14 18:47 UTC (permalink / raw) To: Kent Overstreet Cc: chetan loke, nauman, dm-devel, lsf-pc, linux-scsi, Vivek Goyal On Wed, Mar 14, 2012 at 02:41:35PM -0400, Kent Overstreet wrote: > If you want me to implement bcache differently, shouldn't you explain > why? I'm not sure why I _have_ to justify my decisions to you. You don't have to - unless you want to get bcache merged into the mainline kernel. If you don't want to you probably shouldn't bother appearing at the LSF, though. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:41 ` Kent Overstreet 2012-03-14 18:47 ` Christoph Hellwig @ 2012-03-14 19:04 ` chetan loke 2012-03-15 17:01 ` Kent Overstreet 1 sibling, 1 reply; 21+ messages in thread From: chetan loke @ 2012-03-14 19:04 UTC (permalink / raw) To: Kent Overstreet; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 2:41 PM, Kent Overstreet <koverstreet@google.com> wrote: > If you want me to implement bcache differently, shouldn't you explain why relax. I explained it already but you are defensive about your code. flash-cache works, period. And I may be wrong but it is GPL'd. If perf is an issue then atleast let everyone know how it can be improved rather than saying my way or the highway. aren't you saying in your patches - support for thin prov etc. but if dm provides it then why are you duplicating code? > why? I'm not sure why I _have_ to justify my decisions to you. Others might want to contribute to it and not just consume it. This ain't your local sandbox. So it's quite common to get such questions when you are trying to add new functionality. May be I missed some of your emails. If so point me to them. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 19:04 ` chetan loke @ 2012-03-15 17:01 ` Kent Overstreet 0 siblings, 0 replies; 21+ messages in thread From: Kent Overstreet @ 2012-03-15 17:01 UTC (permalink / raw) To: chetan loke; +Cc: Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 03:04:52PM -0400, chetan loke wrote: > On Wed, Mar 14, 2012 at 2:41 PM, Kent Overstreet <koverstreet@google.com> wrote: > > If you want me to implement bcache differently, shouldn't you explain why > > relax. I explained it already but you are defensive about your code. > flash-cache works, period. And I may be wrong but it is GPL'd. If perf > is an issue then atleast let everyone know how it can be improved > rather than saying my way or the highway. aren't you saying in your > patches - support for thin prov etc. but if dm provides it then why > are you duplicating code? I'm not defensive about my code; you asked why someone would be interested in bcache vs. flash cache, and performance is the most obvious reason. Seems kind of ridiculous to then accuse me of being defensive. If you want to know how the performance of flash cache can be improved, bcache's design is documented and the code is available. I'm not interested in flash cache and improving it isn't my job; furthermore bcache's performance comes from fundamental design decisions so I don't think flash cache is ever going to approach bcache's performance. > > why? I'm not sure why I _have_ to justify my decisions to you. > > Others might want to contribute to it and not just consume it. This > ain't your local sandbox. So it's quite common to get such questions > when you are trying to add new functionality. May be I missed some of > your emails. If so point me to them. Helping others get involved is rather different - I'm perfectly fine to help anyone who's interested, and I spent quite a lot of time documenting and explaining the code, and helping users out. But I'm just not interested in justifying bcache's existince vs. flashcache. If you like flashcache better, it's no skin off my back. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:33 ` chetan loke 2012-03-14 18:41 ` Kent Overstreet @ 2012-03-14 18:54 ` Ted Ts'o 2012-03-14 19:22 ` chetan loke 2012-03-15 17:02 ` Kent Overstreet 1 sibling, 2 replies; 21+ messages in thread From: Ted Ts'o @ 2012-03-14 18:54 UTC (permalink / raw) To: chetan loke Cc: Kent Overstreet, Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 02:33:25PM -0400, chetan loke wrote: > But you are not explaining why dm is not the right stack. Just because > it crashed when you tried doesn't mean it's not the right place. > flash-cache works, doesn't it? flash-cache's limitation is because > it's a dm-target or because it is using hashing or something else? > There are start-ups who are doing quite great with SSD-cache+dm. So > please stop kidding yourself. SATA-attached flash is not the only kind of flash out there you know. There is also PCIe-attached flash which is a wee bit faster (where wee is defined as multiple orders of magnitude --- SATA-attached SSD's typically have thousands of IOPS; Fusion I/O is shipping product today with hundreds of thousands of IOPS, and has demonstrated a billion IOPS early this year). And Fusion I/O isn't the only company shipping PCIe-attached flash products. Startups may be doing great on SSD's; you may want to accept the fact that there is stuff which is way, way, way better out there than SSD's which are available on the market *today*. And it's not like bache which is a new project. It's working code, just like flash cache is today. So it's not like it needs to justify its existence. Best regards, - Ted ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:54 ` Ted Ts'o @ 2012-03-14 19:22 ` chetan loke 2012-03-15 17:02 ` Kent Overstreet 1 sibling, 0 replies; 21+ messages in thread From: chetan loke @ 2012-03-14 19:22 UTC (permalink / raw) To: Ted Ts'o, chetan loke, Kent Overstreet, Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 2:54 PM, Ted Ts'o <tytso@mit.edu> wrote: > On Wed, Mar 14, 2012 at 02:33:25PM -0400, chetan loke wrote: >> But you are not explaining why dm is not the right stack. Just because >> it crashed when you tried doesn't mean it's not the right place. >> flash-cache works, doesn't it? flash-cache's limitation is because >> it's a dm-target or because it is using hashing or something else? >> There are start-ups who are doing quite great with SSD-cache+dm. So >> please stop kidding yourself. > > SATA-attached flash is not the only kind of flash out there you know. > There is also PCIe-attached flash which is a wee bit faster (where wee > is defined as multiple orders of magnitude --- SATA-attached SSD's > typically have thousands of IOPS; Fusion I/O is shipping product today > with hundreds of thousands of IOPS, and has demonstrated a billion > IOPS early this year). And Fusion I/O isn't the only company shipping > PCIe-attached flash products. > We've designed linux targets with million IOPS even before PCIe-flash came into picture you know. So, I think we do know a thing or two about million IOPS and performance. when I said 'cache' I used it loosely. The backing store can be anything - a SSD or PCI-e or adjacent blade over IB. > Startups may be doing great on SSD's; you may want to accept the fact > that there is stuff which is way, way, way better out there than > SSD's which are available on the market *today*. > > And it's not like bache which is a new project. It's working code, > just like flash cache is today. So it's not like it needs to justify > its existence. > we are talking about approaches and not existence. > Best regards, > > - Ted BR, Chetan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Lsf-pc] [Topic] Bcache 2012-03-14 18:54 ` Ted Ts'o 2012-03-14 19:22 ` chetan loke @ 2012-03-15 17:02 ` Kent Overstreet 1 sibling, 0 replies; 21+ messages in thread From: Kent Overstreet @ 2012-03-15 17:02 UTC (permalink / raw) To: Ted Ts'o, chetan loke, Vivek Goyal, lsf-pc, nauman, linux-scsi, dm-devel On Wed, Mar 14, 2012 at 02:54:56PM -0400, Ted Ts'o wrote: > On Wed, Mar 14, 2012 at 02:33:25PM -0400, chetan loke wrote: > > But you are not explaining why dm is not the right stack. Just because > > it crashed when you tried doesn't mean it's not the right place. > > flash-cache works, doesn't it? flash-cache's limitation is because > > it's a dm-target or because it is using hashing or something else? > > There are start-ups who are doing quite great with SSD-cache+dm. So > > please stop kidding yourself. > > SATA-attached flash is not the only kind of flash out there you know. > There is also PCIe-attached flash which is a wee bit faster (where wee > is defined as multiple orders of magnitude --- SATA-attached SSD's > typically have thousands of IOPS; Fusion I/O is shipping product today > with hundreds of thousands of IOPS, and has demonstrated a billion > IOPS early this year). And Fusion I/O isn't the only company shipping > PCIe-attached flash products. > > Startups may be doing great on SSD's; you may want to accept the fact > that there is stuff which is way, way, way better out there than > SSD's which are available on the market *today*. > > And it's not like bache which is a new project. It's working code, > just like flash cache is today. So it's not like it needs to justify > its existence. > > Best regards, > > - Ted Thanks Ted, as usual you word things rather less abrasively than me :) ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-03-16 1:46 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-14 13:32 [Topic] Bcache Kent Overstreet 2012-03-14 15:53 ` [Lsf-pc] " Vivek Goyal 2012-03-14 17:24 ` Kent Overstreet 2012-03-14 22:01 ` Bcache Mike Snitzer 2012-03-14 22:09 ` [Lsf-pc] Bcache Williams, Dan J 2012-03-15 17:27 ` Bcache Kent Overstreet 2012-03-15 20:17 ` Bcache Mike Snitzer 2012-03-15 22:59 ` Bcache Kent Overstreet 2012-03-16 1:45 ` Bcache Mike Snitzer 2012-03-15 19:43 ` [Lsf-pc] [Topic] Bcache Vivek Goyal 2012-03-15 23:46 ` Kent Overstreet 2012-03-14 18:12 ` chetan loke 2012-03-14 18:17 ` Kent Overstreet 2012-03-14 18:33 ` chetan loke 2012-03-14 18:41 ` Kent Overstreet 2012-03-14 18:47 ` Christoph Hellwig 2012-03-14 19:04 ` chetan loke 2012-03-15 17:01 ` Kent Overstreet 2012-03-14 18:54 ` Ted Ts'o 2012-03-14 19:22 ` chetan loke 2012-03-15 17:02 ` Kent Overstreet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox