From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: CLD future plans (was Re: [PATCH v1] CLD replication (WIP)) Date: Fri, 31 Jul 2009 16:41:32 -0400 Message-ID: <4A7356FC.8020108@garzik.org> References: <20090731104031.GA21249@havoc.gtf.org> <4A733A26.2080801@garzik.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Sage Weil Cc: hail-devel@vger.kernel.org Sage Weil wrote: > On Fri, 31 Jul 2009, Jeff Garzik wrote: >> Sage Weil wrote: >>> Hi Jeff, >>> >>> Do you still plan to replace bdb (and it's replication) with a something >>> based on paxos? I'm considering replacing the Ceph monitors (which >>> currently implement paxos, but in a very ceph-specific way) with cld if it >>> can meet the basic requirements. >>> >>> What I'd kind of like to see is a clean implementation of a paxos >>> library--one that leaves out message transport and storage--to build a >>> replicated write-ahead log. And then a separate library for handling the >>> database/namespace served up by cld (be it regular files, bdb, whatever) >>> that leaves replication up to paxos. It looks like Google ended up doing >>> something similar with Chubby (see >>> http://labs.google.com/papers/paxos_made_live.html). >>> >>> Does this sound like the direction you guys are heading in? >> You mean something like http://linux.yyz.us/misc/paxreg.c ? :) > > Yeah, for starters. I'm thinking of the larger problem of integrating of > master elections, add leasing and timeouts (to avoid querying peers for > reads), and so forth to make core paxos usable in a practical environment. > And the glue to bind it to the database (snapshotting and log trimming, > catch-up, etc.). You have outlined my ideas for CLD version 2.0, essentially: create a libpaxos and libpaxos_db, and use those in CLD. So, 100% agreed... Unfortunately that is a lower priority for me than hammering out a rock solid CLD <-> cldc network protocol, and getting out a version 1.0 of the CLD service with _some_ form of solid, working replication and master fail-over. I figured, for CLD version 1.0, db4 already went through the pain of debugging a replicated database. Avoiding that myself would help get CLD up and running much more rapidly. I would be very happy to take PAXOS database patches from others, though ;-) >> Most importantly, from the view of a CLD client (libcldc user), CLD will >> provide the necessary guarantees today. When CLD switches to native PAXOS, >> the CLD client API will not change at all. So, the switchover should be >> transparent from the client's point of view. > > Of course. :) > > I'm going to look a bit more closely at what it'll take to moving ceph to > cld, then. Among other things, it'll mean part of cldc in the kernel, but > should be a net architectural improvement. Cool! A couple kernel-related comments: * some operations involved in master-discovery and master-failover, most notably DNS SRV lookups, you probably want to do in userspace * libcldc is intentionally written such that you should be able to use lib/cldc.c in embedded applications (such as the kernel), and successfully ignore related modules cldc-udp.c and cldc-dns.c. * as such, I am happy to take patches that gets lib/cldc.c as close as possible to your kernel version of cldc core. * Not strictly kernel-related, but I do need to adjust the license of CLD (and chunkd and tabled) libraries to be more friendly to linking with other applications. Presumably LGPL... Jeff