All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: HAIL volunteer Rick Peralta
@ 2009-07-31 16:37 Rick Peralta
  2009-07-31 21:03 ` chunkd design notes (was Re: HAIL volunteer Rick Peralta) Jeff Garzik
  0 siblings, 1 reply; 2+ messages in thread
From: Rick Peralta @ 2009-07-31 16:37 UTC (permalink / raw)
  To: Project Hail; +Cc: Pete Zaitcev, Rick Peralta, jeff

Hi All,

Thanks for inviting me to the forum and thanks to you all for making things happen!

My father said, "don't change anything unless you know why".  Those words ring in my ears more and more after decades of System development.  It is my intention and hope to respect the wisdom of those words and be clear about what the objectives of any endeavor is (including sloth ;^).

The chunkd effort caught my eye for a variety of reasons.  It is functionally very much like something I advocated for a long time ago, it is a relatively simple, yet powerful machine and it may benefit by some redesign for performance (my personal specialty).

The question at hand is: What truly needs to be done?  Bugs are bugs and one can debate one solution over another, but in the end it's about getting things to work well.  Multithreading the transport layer is probably a good idea, but some diligence should be paid to why.  There are any number of other open issues that also deserve some attention.  Coding is fine, but understanding what and why seems to be a first step.

In order to have a common basis for evaluation I'd like to suggest a standard platform to consider in the context of discussions.  The current implementation of chunkd, running on a standard server (probably with a 32 bit address space), with gigabit Ethernet, and a single disk (good for about 25 MB/s & 15 ms seek time).  Consideration of more or different bulk storage, 10 Gbe, IB or other high bandwidth implementations and so forth can be considered as branches from the core model.

Given the current implementation of chunkd, it generally resides in user space, over a standard file system (complete with caches, overhead and whatever else comes along).

PZ>
I have some short list todo for Chunk, after which I don't have
any particular plans:
 * Exit if CLD registration fails (maybe!).
 * Put ourhost into the CLD record, and the port.
 * Use base directory instead of Cell.
 * Switch to asprintf for CLD filenames, Geo.

FD>
Yes. I also think that chunkd should not do it's own replication. As the
strategy may be domain/application dependend. Therefor I'd appreciate if
chunkd would provide some kind of "copy(dst,sha)" function, to be able
to directly copy to another chunkd instance.

JG>
Hopefully all this is wrapped up into libcldc...

JG>
* total single-node volume size:  one cheap SATA hard drive
* total number of chunks:  ==
	total number of tabled objects / number of storage nodes
* distribution of chunk sizes:  dependent upon the application using tabled
* aggregate bandwidth:  dependent upon the application using tabled

fbp>
Might we put some numbers to this?
Most notable is typical chunk size and number of supported clients.

 - Rick Peralta
    www.linkedin.com/in/rickperalta


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-07-31 21:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-31 16:37 HAIL volunteer Rick Peralta Rick Peralta
2009-07-31 21:03 ` chunkd design notes (was Re: HAIL volunteer Rick Peralta) Jeff Garzik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.