From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeff Garzik <jeff@garzik.org>
Subject: Re: HAIL volunteer Rick Peralta
Date: Wed, 29 Jul 2009 13:17:34 -0400
Message-ID: <4A70842E.8020908@garzik.org>
References: <29025029.1248785350151.JavaMail.root@mswamui-andean.atl.sa.earthlink.net>	<4A6F5A3E.1070907@garzik.org>	<4A704376.6000303@tiac.net>	<4A705D5C.9050909@garzik.org> <20090729105202.7d0410de.zaitcev@redhat.com>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Return-path: <hail-devel-owner@vger.kernel.org>
In-Reply-To: <20090729105202.7d0410de.zaitcev@redhat.com>
Sender: hail-devel-owner@vger.kernel.org
List-ID: <hail-devel.vger.kernel.org>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
To: Pete Zaitcev <zaitcev@redhat.com>
Cc: Rick Peralta <fbp@tiac.net>, Project Hail <hail-devel@vger.kernel.org>, zaitcev@redat.com

Pete Zaitcev wrote:
> On Wed, 29 Jul 2009 10:31:56 -0400, Jeff Garzik <jeff@garzik.org> wrote:
> 
>>> Is there someone taking the point for the chunkd development?
>> That's me, for the moment.  :)
> 
> I have some short list todo for Chunk, after which I don't have
> any particular plans:
>  * Exit if CLD registration fails (maybe!).

Hopefully all this is wrapped up into libcldc, such that, an application 
needs to only worry about major, abstracted events after calling 
new-session:

* no master, after defined "hunt" procedure.

	This includes both init and master failure (as distinguished
	from fail-over).

	The application will need to be in the "no CLD session"
	state in both cases.

	And indeed, exit() might be the best way to do that.

* master fail-over

	Flush our [currently non-existent] CLD cache.

etc.


>  * Put ourhost into the CLD record, and the port.
>  * Use base directory instead of Cell.
>  * Switch to asprintf for CLD filenames, Geo.

agreed


> So far we managed hacking on same codebase with relative ease.
> Just make sure to post patches early.
> 
>> You should read the GoogleFS paper referenced on the chunkd wiki page: 
>> http://labs.google.com/papers/gfs-sosp2003.pdf  It describes the purpose 
>> and use of a chunk server, in the context of distributed cloud storage.
> 
> I think we're at a point where we have our own base of knowledge
> and evolved an overall architecture to the point we don't have to
> ape every little detail of Google architecture.

Well, until the wiki has a description of the basic idea of a chunk 
server, the Google paper will have to do.

The point is not that we are aping Google, but more to describe the 
general concept to someone who does not know what a chunk server is, and 
how a chunk server fits into the "grand design."


> In particular I'm
> going to fight hard any talk of Chunk doing its own replication,
> for now at least.

WRT chunkd and replication, yes, that's fine for version 1.0.

But consider which is more likely to have bandwidth to spare:

	a) client -> service
		or
	b) service -> service

Of the two, I'd say "a" is a bit more likely to be remote (WAN) and have 
a slow-upload situation like my home cable modem (1 mbps down, 50 kbps 
up), and "b" is more likely to be LAN.

Or to take converse logic -- is it likely that service->service 
replication is SLOWER than client->service replication?

Every way I look at it, client->{service,service,service} replication 
seems both easy... and potentially slower than alternatives :)

	Jeff