From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: tabled vs. BDB high availability Date: Mon, 08 Mar 2010 07:36:40 -0500 Message-ID: <4B94EF58.5080409@garzik.org> References: <4B942DFD.5020102@garzik.org> <20100307215625.36568e3c@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=BI2wDQurDpe4sUYAsTBLuC4aecl7CSLA2XKRPHMrwyI=; b=qkpoF/5qjqvohvIahC35xg2JZtWlCCCgSWAvQH23EafspK+m1HtdURcWxRquO0FJMd U8ksYVzV4FTuyM0RRC1JYuXHuBhRK/8xHpQr3pwz5vVBu72uE4HeoIZLgmc6XJFazP+r w6fXpNtI9AdSzIqA3KXmoE3Xystg81XrXbP9I= In-Reply-To: <20100307215625.36568e3c@redhat.com> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: Project Hail On 03/07/2010 11:56 PM, Pete Zaitcev wrote: > On Sun, 07 Mar 2010 17:51:41 -0500 > Jeff Garzik wrote: > >> If a site implements distinct endpoints for each tabled node >> ("t1.example.com", "t2.example.com", etc.) then redirects should result >> in directing clients to the current master, assuming that slaves have a >> deterministic manner of discovering the current master. > > I did not implement it, but it's trivial in CLD. Just create a file > named "master" in the group. For added cleverness, split up the > notions of "group master" (who owns the file and most of the DB, > and knows about each master for each bucket), and "bucket master". > >> Such a setup also makes use of IP Virtual Server impossible. > > Bah, big deal. It is, if we actually want to attract users. >> But that brings us to our second problem, a common problem in computer >> science: the thundering herd. >> >> When a tabled endpoint crashes or loses its master status, clients must >> move en masse to the new master. As client counts increase, this >> becomes a "thundering herd" DDoS'ing the new target machine. > > Not a problem in practice, I expect. S3 clients are not clients that > sit connected and then are notified about a failover. Instead, they > connect, perform operations as fast as they can, quit. Therefore, > there is not going to be a spike in traffic because of the failover > that is significantly greater than the normal operations rate. Think standard web browser behavior, including HTTP 1.1 pipelining and extended connections... Think also about the length of time it takes to negotiate a new master, and what the clients will do in the meantime. Major thundering herd. >> Ideally, we want to enable writing on every tabled node in a cell. >> Given that the metadata is the only bit that _must_ be performed on the >> master, it seems like the least-effort, least-cost solution for us is >> for slaves to send a "write metadata" message to the master, and then >> perform the data write itself. > > I would not do it, at least not yet. A better effect would be to > have separate DBs with separate masters for each bucket. No, that just multiplies the problems already inherent in the current design, as well as creating new problems. BDB just isn't built for that, so scaling that solution is a major problem. A bucket should be a scalable unit, and that does nothing to solve it. Whereas if we solve the current problem described in $thread, buckets are automatically scalable as well. > Another thing, how many clients do you think tabled is going to > have accessing it at any given time in any realistic deployments > for years to come? How about ONE (although, it may be multi-threaded)? > > One retarded thing we can do now is to rush into implementing things > like slave-to-master metadata forwarding when we do not have a single > installation to guide us. I am glad Apache httpd hackers never set such strict, low goals :) tabled is a web server, with all that entails, because the S3 API is often used to front a web site (or at least the static portion thereof). Standard web browser behavior and multiple, pipelining clients are part of tabled's client base. If we fail to understand and solve problems that people already solved ten years ago, then tabled certainly will not attract end-user installations. Understanding standard web server design, and the problems and solutions that arose from that, are very important. Jeff