From: Pete Zaitcev <zaitcev@redhat.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: Project Hail <hail-devel@vger.kernel.org>, zaitcev@redhat.com
Subject: Re: tabled vs. BDB high availability
Date: Sun, 7 Mar 2010 21:56:25 -0700 [thread overview]
Message-ID: <20100307215625.36568e3c@redhat.com> (raw)
In-Reply-To: <4B942DFD.5020102@garzik.org>
On Sun, 07 Mar 2010 17:51:41 -0500
Jeff Garzik <jeff@garzik.org> wrote:
> If a site implements distinct endpoints for each tabled node
> ("t1.example.com", "t2.example.com", etc.) then redirects should result
> in directing clients to the current master, assuming that slaves have a
> deterministic manner of discovering the current master.
I did not implement it, but it's trivial in CLD. Just create a file
named "master" in the group. For added cleverness, split up the
notions of "group master" (who owns the file and most of the DB,
and knows about each master for each bucket), and "bucket master".
> Such a setup also makes use of IP Virtual Server impossible.
Bah, big deal.
> But that brings us to our second problem, a common problem in computer
> science: the thundering herd.
>
> When a tabled endpoint crashes or loses its master status, clients must
> move en masse to the new master. As client counts increase, this
> becomes a "thundering herd" DDoS'ing the new target machine.
Not a problem in practice, I expect. S3 clients are not clients that
sit connected and then are notified about a failover. Instead, they
connect, perform operations as fast as they can, quit. Therefore,
there is not going to be a spike in traffic because of the failover
that is significantly greater than the normal operations rate.
> Ideally, we want to enable writing on every tabled node in a cell.
> Given that the metadata is the only bit that _must_ be performed on the
> master, it seems like the least-effort, least-cost solution for us is
> for slaves to send a "write metadata" message to the master, and then
> perform the data write itself.
I would not do it, at least not yet. A better effect would be to
have separate DBs with separate masters for each bucket.
Another thing, how many clients do you think tabled is going to
have accessing it at any given time in any realistic deployments
for years to come? How about ONE (although, it may be multi-threaded)?
One retarded thing we can do now is to rush into implementing things
like slave-to-master metadata forwarding when we do not have a single
installation to guide us.
Anyway, my first priority is to make sure that slave mode works at all.
Currently, tabled will suicide if it cannot grab master, instead of
signing up for lock notifications, etc.
-- Pete
next prev parent reply other threads:[~2010-03-08 4:56 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-07 22:51 tabled vs. BDB high availability Jeff Garzik
2010-03-08 4:56 ` Pete Zaitcev [this message]
2010-03-08 12:36 ` Jeff Garzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100307215625.36568e3c@redhat.com \
--to=zaitcev@redhat.com \
--cc=hail-devel@vger.kernel.org \
--cc=jeff@garzik.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.