From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: Metadata replication in tabled Date: Fri, 25 Jun 2010 03:26:05 -0400 Message-ID: <4C245A0D.4020204@garzik.org> References: <20100624183123.08248d80@lembas.zaitcev.lan> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=zf/92BncJAtyaH1Tej19xU8t8HtTIJk+XaFvm0Uvcw8=; b=VHzTImkAjcY9HoT0AyFp1HSV7uFb/KrM9OVPlkvkqmALCCos6V+hnrYtrzzAyWxH0Q HhhzRCmr4biTajlXDE3ut9ucPS1ESjh/Mn+rwZcEfWO1Gzo7BtO2Fp0+KBRz/gB3BE8P fV9Th0UONV+enzVZZA6iw6+ZfpfkyzTkUC8yw= In-Reply-To: <20100624183123.08248d80@lembas.zaitcev.lan> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: Project Hail List On 06/24/2010 08:31 PM, Pete Zaitcev wrote: > I worked on fixing the metadata replication in tabled. There were some > difficulties in existing code, in particular the aliasing between the > hostname used to identify nodes and the hostname used in bind() for > listening was impossible to work around in repmgr. In the end I gave > up on repmgr and switched tabled to the "Base" API. So, the replication > works now... for some values of "works", which is still a progress. > > We essentially have a tabled that can really be considered as replicated. > Before, it was only data replication, which was great and all but > useless against disk failues in the tabled's database. I think it's > a major treshold for tabled. er, huh? In addition to data replication, we already have metadata replication via db4 repmgr in tabled.git, which ensures metadata db integrity in the case of disk or tabled node failure. The core problem with current tabled.git is that S3 clients expect all nodes to support PUT/DELETE as well as GET. Our current use w/ db4 slave mode does not fulfill this client requirement. Your work here, moving to the base replication API, eliminates several obstacles on the path to making all tabled nodes support PUT/DELETE. But it is not true to say that metadata replication did not exist prior to this patch. With either repmgr or base API, we still need to make failover more transparent to our S3 clients. > Unfortunately, the code is rather ugly. I tried to create a kind > of an optional replication layer, so that tdbadm could be built > without it. Although I succeeded, the result is a hideous mess of > methods and callbacks, functions with side effects, and a bunch > of poorly laid out state machines. In places I cannot wrap my own > head around what's going on without a help of pencil and paper. > > So, while working, it's not ready for going in. Still, I'm going > to throw it here in case I get hit by a bus, or if anyone wants > an example of using db4 replication early. Based on a quick read, it seems straightforward, and looks like something I can try tomorrow... Very excited to try this :) Jeff