* RE: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials [not found] <3689AF909D816446BA505D21F1461AE4C75110@cacexc04.americas.cpqcorp.net> @ 2004-08-11 18:46 ` Steven Dake 2004-08-12 9:57 ` Lars Marowsky-Bree 0 siblings, 1 reply; 9+ messages in thread From: Steven Dake @ 2004-08-11 18:46 UTC (permalink / raw) To: Walker, Bruce J, linux-kernel Cc: Discussion of clustering software components including GFS, Chris Wright, dcl_discussion, cgl_discussion On Wed, 2004-08-11 at 11:19, Walker, Bruce J wrote: > > * John Cherry (cherry@osdl.org) wrote: > > At the summit I attended, we also talked about using GFS as the > initial > > "consumer" of the cluster infrastructure. The cluster infrastructure > > doesn't stand a chance of mainline acceptance without a consumer that > > both validates the interfaces and hardens the services. > > Given cman etc. was written for GFS, it doesn't prove much that it works > with GFS. Having an independent cluster effort (like OpenSSI) use the > underlying infrastructure presents a much more compelling case. The > OpenSSI project has started to look into this but help from OSDL, Intel > and/or RedHat wouldn't be discouraged. Also, having SAF layered and/or > ha-linux layered would also bolster the case as a general > infrastructure. > > Bruce walker > OpenSSI project lead > > Bruce, I have looked over the cman protocol and find it is suboptimal. Here is how it works: it sends a message using multicast, adds a timeout for the message, and waits for an acknowledgement from every node in the system. Once the timer expires, it resends the message. If every node responds, it deletes the timer. This sort of protocol wont scale (imagine an ack from 32 nodes for every message sent, and you can understand why) and wont work within partitionable networks (some messages may be delivered to some in the partition and not others). It doesn't provide any sort of strong membership guarantees (the same message may be delivered under various membership views) or delivery guarantees (messages only have FIFO guarantees, when distributed systems require agreed or safe ordering). I am not sure if you have seen the openais project at http://developer.osdl.org/dev/openais. It is a userland implementation of the SA Forum's AIS interfaces. The openais project uses a technology called virtual synchrony to provide cluster messaging and solves the above problems. The current protocol can obtain upwards of 7MB/sec multicast transfer from 1 node to 8 nodes with encryption and authentication. We use encryption and MAC to ensure security. If you want to see the group messaging api, look at exec/gmi.h from the source on the above website. If we can't live with the cluster services in userland (although I'm still not convinced), then atleast the group messaging protocol in the kernel could be based upon 20 years of research in group messaging and work properly under _all_ fault scenarios. I'd invite you or other interested parties, to port the openais code for virtual synchrony to the kernel. I'd do it myself, but I'm focused on implementing openais. Then, if the protocols ended up living in kernel space, atleast the protocols would be secure and be able to meet the needs of all users (including cluster userland services). The group messaging protocol code is compact (about 3500 lines). I'd expect with the address family kernel stuff, it would increase to 4500 or so. Regards -steve > > > > ______________________________________________________________________ > _______________________________________________ > cgl_discussion mailing list > cgl_discussion@lists.osdl.org > http://lists.osdl.org/mailman/listinfo/cgl_discussion ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-11 18:46 ` [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials Steven Dake @ 2004-08-12 9:57 ` Lars Marowsky-Bree 2004-08-12 17:42 ` Steven Dake 0 siblings, 1 reply; 9+ messages in thread From: Lars Marowsky-Bree @ 2004-08-12 9:57 UTC (permalink / raw) To: Steven Dake, Walker, Bruce J, linux-kernel Cc: Chris Wright, Discussion of clustering software components including GFS, dcl_discussion, cgl_discussion On 2004-08-11T11:46:03, Steven Dake <sdake@mvista.com> said: > If we can't live with the cluster services in userland (although I'm > still not convinced), then atleast the group messaging protocol in the > kernel could be based upon 20 years of research in group messaging and > work properly under _all_ fault scenarios. Right. Another important alternative maybe the Transis group communication suite, which has been released as GPL/LGPL now. This all just highlights that we need to think about communication some more before we can tackle it sensibly, but of course I'll be glad if someone proves me wrong and Just Does It ;-) Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ I allow neither my experience SUSE Labs, Research and Development | nor my cynicism to deter my SUSE LINUX AG - A Novell company \ optimistic outlook on life. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-12 9:57 ` Lars Marowsky-Bree @ 2004-08-12 17:42 ` Steven Dake 2004-08-12 20:37 ` Lars Marowsky-Bree 0 siblings, 1 reply; 9+ messages in thread From: Steven Dake @ 2004-08-12 17:42 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Walker, Bruce J, linux-kernel, Chris Wright, Discussion of clustering software components including GFS, dcl_discussion, cgl_discussion On Thu, 2004-08-12 at 02:57, Lars Marowsky-Bree wrote: > On 2004-08-11T11:46:03, > Steven Dake <sdake@mvista.com> said: > > > If we can't live with the cluster services in userland (although I'm > > still not convinced), then atleast the group messaging protocol in the > > kernel could be based upon 20 years of research in group messaging and > > work properly under _all_ fault scenarios. > > Right. Another important alternative maybe the Transis group > communication suite, which has been released as GPL/LGPL now. > > This all just highlights that we need to think about communication some > more before we can tackle it sensibly, but of course I'll be glad if > someone proves me wrong and Just Does It ;-) > agreed... Transis in kernel would be a fine alternative to openais gmi in kernel. Speaking of transis, is the code posted anywhere? I'd like to have a look. Thanks -steve > > Sincerely, > Lars Marowsky-Brée <lmb@suse.de> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-12 17:42 ` Steven Dake @ 2004-08-12 20:37 ` Lars Marowsky-Bree 2004-08-12 22:59 ` Steven Dake 0 siblings, 1 reply; 9+ messages in thread From: Lars Marowsky-Bree @ 2004-08-12 20:37 UTC (permalink / raw) To: Steven Dake Cc: Chris Wright, Discussion of clustering software components including GFS, dcl_discussion, Walker, Bruce J, linux-kernel, cgl_discussion On 2004-08-12T10:42:16, Steven Dake <sdake@mvista.com> said: > agreed... Transis in kernel would be a fine alternative to openais gmi > in kernel. > > Speaking of transis, is the code posted anywhere? I'd like to have a > look. It's not yet at the final location, but we put up what we got at http://wiki.trick.ca/linux-ha/Transis . Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ This space / SUSE Labs, Research and Development | intentionally | SUSE LINUX AG - A Novell company \ left blank / ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-12 20:37 ` Lars Marowsky-Bree @ 2004-08-12 22:59 ` Steven Dake 2004-08-13 9:40 ` Lars Marowsky-Bree 2004-08-13 15:54 ` Jonathan Stanton 0 siblings, 2 replies; 9+ messages in thread From: Steven Dake @ 2004-08-12 22:59 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Chris Wright, Discussion of clustering software components including GFS, dcl_discussion, Walker, Bruce J, linux-kernel, cgl_discussion On Thu, 2004-08-12 at 13:37, Lars Marowsky-Bree wrote: > On 2004-08-12T10:42:16, > Steven Dake <sdake@mvista.com> said: > > > agreed... Transis in kernel would be a fine alternative to openais gmi > > in kernel. > > > > Speaking of transis, is the code posted anywhere? I'd like to have a > > look. > > It's not yet at the final location, but we put up what we got at > http://wiki.trick.ca/linux-ha/Transis . > > Lars Thanks for posting transis. I had a look at the examples and API. The API is of course different then openais and focused on client/server architecture. I tried a performance test by sending a 64k message, and then receiving it 10 times with two nodes. This operation takes about 5 seconds on my hardware which is 128k/sec. I was expecting more like 8-10MB/sec. Is there anything that can be done to improve the performance? Thanks -steve Certainly a different sort of API then openais... > Sincerely, > Lars Marowsky-Brée <lmb@suse.de> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-12 22:59 ` Steven Dake @ 2004-08-13 9:40 ` Lars Marowsky-Bree 2004-08-13 15:54 ` Jonathan Stanton 1 sibling, 0 replies; 9+ messages in thread From: Lars Marowsky-Bree @ 2004-08-13 9:40 UTC (permalink / raw) To: Steven Dake Cc: Chris Wright, Discussion of clustering software components including GFS, dcl_discussion, Walker, Bruce J, linux-kernel, cgl_discussion On 2004-08-12T15:59:10, Steven Dake <sdake@mvista.com> said: > Thanks for posting transis. I had a look at the examples and API. The > API is of course different then openais and focused on client/server > architecture. Right. > I tried a performance test by sending a 64k message, and then receiving > it 10 times with two nodes. This operation takes about 5 seconds on my > hardware which is 128k/sec. I was expecting more like 8-10MB/sec. Is > there anything that can be done to improve the performance? I've not yet done any real tests with it, so I'm not sure. We were mostly going from the theoretical description ;) But I think 128k/s is really a bit low, so I assume something ain't quite right yet... We'll figure it out. It's possible that maybe it's not the way to go afterall, but before we could go looking we first needed it as GPL/LGPL (for not becoming IP-tainted). Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ This space / SUSE Labs, Research and Development | intentionally | SUSE LINUX AG - A Novell company \ left blank / ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-12 22:59 ` Steven Dake 2004-08-13 9:40 ` Lars Marowsky-Bree @ 2004-08-13 15:54 ` Jonathan Stanton 2004-08-13 20:30 ` Lars Marowsky-Bree 1 sibling, 1 reply; 9+ messages in thread From: Jonathan Stanton @ 2004-08-13 15:54 UTC (permalink / raw) To: sdake, Discussion of clustering software components including GFS Cc: Chris Wright, dcl_discussion, linux-kernel, cgl_discussion Hi, I just joined the linux-cluster list after seeing a few of the messages that were cross-posted to linux-kernel. On Thu, Aug 12, 2004 at 03:59:10PM -0700, Steven Dake wrote: > Lars > > Thanks for posting transis. I had a look at the examples and API. The > API is of course different then openais and focused on client/server > architecture. If you havn't looked at it already, you might want to try out the Spread group communication system. http://www.spread.org/ It is, conceptually although not code-wise, a decendant of the Transis work (and the Totem system from UCSB) and is relatively widely used as a production quality group messaging system (Some apache modules use it along with a number of large web-clusters, a few commercial clustered storage systems, and a lot of custom replication apps). It is not under GPL but is open-source under a bsd-style (but not exactly the same) license. Like transis it has a client-server architecture (and a simpler API). > I tried a performance test by sending a 64k message, and then receiving > it 10 times with two nodes. This operation takes about 5 seconds on my > hardware which is 128k/sec. I was expecting more like 8-10MB/sec. Is > there anything that can be done to improve the performance? I would expect transis to definitely do better then 128k/s given tests we ran a number of years ago, but on upto medium sized lan environments the totem/spread protocols are generally faster with less cpu overhead. I know Spread could get 80Mb/s a number of years ago. We recently re-ran a clean set of benchmarks and wrote them up. You can find them at: http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf I admit some bias as I'm one of the lead developers of Spread, and we (the developers) have been building group messaging systems since the early 90's -- so I may look at things a bit differently -- so I would be very intersted in your thoughts on how you could use GCS and whether Spread would be useful. Cheers, Jonathan -- ------------------------------------------------------- Jonathan R. Stanton jonathan@cs.jhu.edu Dept. of Computer Science Johns Hopkins University ------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-13 15:54 ` Jonathan Stanton @ 2004-08-13 20:30 ` Lars Marowsky-Bree 2004-08-13 22:53 ` Jonathan Stanton 0 siblings, 1 reply; 9+ messages in thread From: Lars Marowsky-Bree @ 2004-08-13 20:30 UTC (permalink / raw) To: Jonathan Stanton, sdake, Discussion of clustering software components including GFS Cc: Chris Wright, dcl_discussion, cgl_discussion, linux-kernel On 2004-08-13T11:54:41, Jonathan Stanton <jonathan@cnds.jhu.edu> said: > If you havn't looked at it already, you might want to try out the Spread > group communication system. > > http://www.spread.org/ The intel lawyers have identified the Spread license to be GPL-incompatible. Otherwise, I agree, Spread is very nice. If those issues could be resolved, that may be an interesting option too. (I think the advertising clause and something else clash with the (L)GPL; I can put you in contact with the Intel folks if you wish to resolve this.) Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ This space / SUSE Labs, Research and Development | intentionally | SUSE LINUX AG - A Novell company \ left blank / ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials 2004-08-13 20:30 ` Lars Marowsky-Bree @ 2004-08-13 22:53 ` Jonathan Stanton 0 siblings, 0 replies; 9+ messages in thread From: Jonathan Stanton @ 2004-08-13 22:53 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: sdake, Discussion of clustering software components including GFS, Chris Wright, dcl_discussion, cgl_discussion, linux-kernel On Fri, Aug 13, 2004 at 10:30:29PM +0200, Lars Marowsky-Bree wrote: > On 2004-08-13T11:54:41, > Jonathan Stanton <jonathan@cnds.jhu.edu> said: > > > If you havn't looked at it already, you might want to try out the Spread > > group communication system. > > > > http://www.spread.org/ > > The intel lawyers have identified the Spread license to be > GPL-incompatible. > > Otherwise, I agree, Spread is very nice. If those issues could be > resolved, that may be an interesting option too. > > (I think the advertising clause and something else clash with the > (L)GPL; I can put you in contact with the Intel folks if you wish to > resolve this.) I would appreciate that. We did choose our licensing for what I think are good reasons, but we have also worked in the past with outside projects with possible license conflicts and have been able to resolve them. So I would like to understand exactly what the issues are. Cheers, Jonathan -- ------------------------------------------------------- Jonathan R. Stanton jonathan@cs.jhu.edu Dept. of Computer Science Johns Hopkins University ------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-08-13 22:54 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <3689AF909D816446BA505D21F1461AE4C75110@cacexc04.americas.cpqcorp.net>
2004-08-11 18:46 ` [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials Steven Dake
2004-08-12 9:57 ` Lars Marowsky-Bree
2004-08-12 17:42 ` Steven Dake
2004-08-12 20:37 ` Lars Marowsky-Bree
2004-08-12 22:59 ` Steven Dake
2004-08-13 9:40 ` Lars Marowsky-Bree
2004-08-13 15:54 ` Jonathan Stanton
2004-08-13 20:30 ` Lars Marowsky-Bree
2004-08-13 22:53 ` Jonathan Stanton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox