* RE: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
[not found] <3689AF909D816446BA505D21F1461AE4C75110@cacexc04.americas.cpqcorp.net>
@ 2004-08-11 18:46 ` Steven Dake
2004-08-12 9:57 ` Lars Marowsky-Bree
0 siblings, 1 reply; 9+ messages in thread
From: Steven Dake @ 2004-08-11 18:46 UTC (permalink / raw)
To: Walker, Bruce J, linux-kernel
Cc: Discussion of clustering software components including GFS,
Chris Wright, dcl_discussion, cgl_discussion
On Wed, 2004-08-11 at 11:19, Walker, Bruce J wrote:
> > * John Cherry (cherry@osdl.org) wrote:
> > At the summit I attended, we also talked about using GFS as the
> initial
> > "consumer" of the cluster infrastructure. The cluster infrastructure
> > doesn't stand a chance of mainline acceptance without a consumer that
> > both validates the interfaces and hardens the services.
>
> Given cman etc. was written for GFS, it doesn't prove much that it works
> with GFS. Having an independent cluster effort (like OpenSSI) use the
> underlying infrastructure presents a much more compelling case. The
> OpenSSI project has started to look into this but help from OSDL, Intel
> and/or RedHat wouldn't be discouraged. Also, having SAF layered and/or
> ha-linux layered would also bolster the case as a general
> infrastructure.
>
> Bruce walker
> OpenSSI project lead
>
>
Bruce,
I have looked over the cman protocol and find it is suboptimal. Here is
how it works: it sends a message using multicast, adds a timeout for the
message, and waits for an acknowledgement from every node in the
system. Once the timer expires, it resends the message. If every node
responds, it deletes the timer.
This sort of protocol wont scale (imagine an ack from 32 nodes for every
message sent, and you can understand why) and wont work within
partitionable networks (some messages may be delivered to some in the
partition and not others). It doesn't provide any sort of strong
membership guarantees (the same message may be delivered under various
membership views) or delivery guarantees (messages only have FIFO
guarantees, when distributed systems require agreed or safe ordering).
I am not sure if you have seen the openais project at
http://developer.osdl.org/dev/openais. It is a userland implementation
of the SA Forum's AIS interfaces. The openais project uses a technology
called virtual synchrony to provide cluster messaging and solves the
above problems. The current protocol can obtain upwards of 7MB/sec
multicast transfer from 1 node to 8 nodes with encryption and
authentication. We use encryption and MAC to ensure security. If you
want to see the group messaging api, look at exec/gmi.h from the source
on the above website.
If we can't live with the cluster services in userland (although I'm
still not convinced), then atleast the group messaging protocol in the
kernel could be based upon 20 years of research in group messaging and
work properly under _all_ fault scenarios.
I'd invite you or other interested parties, to port the openais code for
virtual synchrony to the kernel. I'd do it myself, but I'm focused on
implementing openais. Then, if the protocols ended up living in kernel
space, atleast the protocols would be secure and be able to meet the
needs of all users (including cluster userland services). The group
messaging protocol code is compact (about 3500 lines). I'd expect with
the address family kernel stuff, it would increase to 4500 or so.
Regards
-steve
>
>
>
> ______________________________________________________________________
> _______________________________________________
> cgl_discussion mailing list
> cgl_discussion@lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/cgl_discussion
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-11 18:46 ` [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials Steven Dake
@ 2004-08-12 9:57 ` Lars Marowsky-Bree
2004-08-12 17:42 ` Steven Dake
0 siblings, 1 reply; 9+ messages in thread
From: Lars Marowsky-Bree @ 2004-08-12 9:57 UTC (permalink / raw)
To: Steven Dake, Walker, Bruce J, linux-kernel
Cc: Chris Wright,
Discussion of clustering software components including GFS,
dcl_discussion, cgl_discussion
On 2004-08-11T11:46:03,
Steven Dake <sdake@mvista.com> said:
> If we can't live with the cluster services in userland (although I'm
> still not convinced), then atleast the group messaging protocol in the
> kernel could be based upon 20 years of research in group messaging and
> work properly under _all_ fault scenarios.
Right. Another important alternative maybe the Transis group
communication suite, which has been released as GPL/LGPL now.
This all just highlights that we need to think about communication some
more before we can tackle it sensibly, but of course I'll be glad if
someone proves me wrong and Just Does It ;-)
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering \ I allow neither my experience
SUSE Labs, Research and Development | nor my cynicism to deter my
SUSE LINUX AG - A Novell company \ optimistic outlook on life.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-12 9:57 ` Lars Marowsky-Bree
@ 2004-08-12 17:42 ` Steven Dake
2004-08-12 20:37 ` Lars Marowsky-Bree
0 siblings, 1 reply; 9+ messages in thread
From: Steven Dake @ 2004-08-12 17:42 UTC (permalink / raw)
To: Lars Marowsky-Bree
Cc: Walker, Bruce J, linux-kernel, Chris Wright,
Discussion of clustering software components including GFS,
dcl_discussion, cgl_discussion
On Thu, 2004-08-12 at 02:57, Lars Marowsky-Bree wrote:
> On 2004-08-11T11:46:03,
> Steven Dake <sdake@mvista.com> said:
>
> > If we can't live with the cluster services in userland (although I'm
> > still not convinced), then atleast the group messaging protocol in the
> > kernel could be based upon 20 years of research in group messaging and
> > work properly under _all_ fault scenarios.
>
> Right. Another important alternative maybe the Transis group
> communication suite, which has been released as GPL/LGPL now.
>
> This all just highlights that we need to think about communication some
> more before we can tackle it sensibly, but of course I'll be glad if
> someone proves me wrong and Just Does It ;-)
>
agreed... Transis in kernel would be a fine alternative to openais gmi
in kernel.
Speaking of transis, is the code posted anywhere? I'd like to have a
look.
Thanks
-steve
>
> Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-12 17:42 ` Steven Dake
@ 2004-08-12 20:37 ` Lars Marowsky-Bree
2004-08-12 22:59 ` Steven Dake
0 siblings, 1 reply; 9+ messages in thread
From: Lars Marowsky-Bree @ 2004-08-12 20:37 UTC (permalink / raw)
To: Steven Dake
Cc: Chris Wright,
Discussion of clustering software components including GFS,
dcl_discussion, Walker, Bruce J, linux-kernel, cgl_discussion
On 2004-08-12T10:42:16,
Steven Dake <sdake@mvista.com> said:
> agreed... Transis in kernel would be a fine alternative to openais gmi
> in kernel.
>
> Speaking of transis, is the code posted anywhere? I'd like to have a
> look.
It's not yet at the final location, but we put up what we got at
http://wiki.trick.ca/linux-ha/Transis .
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering \ This space /
SUSE Labs, Research and Development | intentionally |
SUSE LINUX AG - A Novell company \ left blank /
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-12 20:37 ` Lars Marowsky-Bree
@ 2004-08-12 22:59 ` Steven Dake
2004-08-13 9:40 ` Lars Marowsky-Bree
2004-08-13 15:54 ` Jonathan Stanton
0 siblings, 2 replies; 9+ messages in thread
From: Steven Dake @ 2004-08-12 22:59 UTC (permalink / raw)
To: Lars Marowsky-Bree
Cc: Chris Wright,
Discussion of clustering software components including GFS,
dcl_discussion, Walker, Bruce J, linux-kernel, cgl_discussion
On Thu, 2004-08-12 at 13:37, Lars Marowsky-Bree wrote:
> On 2004-08-12T10:42:16,
> Steven Dake <sdake@mvista.com> said:
>
> > agreed... Transis in kernel would be a fine alternative to openais gmi
> > in kernel.
> >
> > Speaking of transis, is the code posted anywhere? I'd like to have a
> > look.
>
> It's not yet at the final location, but we put up what we got at
> http://wiki.trick.ca/linux-ha/Transis .
>
>
Lars
Thanks for posting transis. I had a look at the examples and API. The
API is of course different then openais and focused on client/server
architecture.
I tried a performance test by sending a 64k message, and then receiving
it 10 times with two nodes. This operation takes about 5 seconds on my
hardware which is 128k/sec. I was expecting more like 8-10MB/sec. Is
there anything that can be done to improve the performance?
Thanks
-steve
Certainly a different sort of API then openais...
> Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-12 22:59 ` Steven Dake
@ 2004-08-13 9:40 ` Lars Marowsky-Bree
2004-08-13 15:54 ` Jonathan Stanton
1 sibling, 0 replies; 9+ messages in thread
From: Lars Marowsky-Bree @ 2004-08-13 9:40 UTC (permalink / raw)
To: Steven Dake
Cc: Chris Wright,
Discussion of clustering software components including GFS,
dcl_discussion, Walker, Bruce J, linux-kernel, cgl_discussion
On 2004-08-12T15:59:10,
Steven Dake <sdake@mvista.com> said:
> Thanks for posting transis. I had a look at the examples and API. The
> API is of course different then openais and focused on client/server
> architecture.
Right.
> I tried a performance test by sending a 64k message, and then receiving
> it 10 times with two nodes. This operation takes about 5 seconds on my
> hardware which is 128k/sec. I was expecting more like 8-10MB/sec. Is
> there anything that can be done to improve the performance?
I've not yet done any real tests with it, so I'm not sure. We were
mostly going from the theoretical description ;) But I think 128k/s is
really a bit low, so I assume something ain't quite right yet... We'll
figure it out.
It's possible that maybe it's not the way to go afterall, but before we
could go looking we first needed it as GPL/LGPL (for not becoming
IP-tainted).
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering \ This space /
SUSE Labs, Research and Development | intentionally |
SUSE LINUX AG - A Novell company \ left blank /
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-12 22:59 ` Steven Dake
2004-08-13 9:40 ` Lars Marowsky-Bree
@ 2004-08-13 15:54 ` Jonathan Stanton
2004-08-13 20:30 ` Lars Marowsky-Bree
1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Stanton @ 2004-08-13 15:54 UTC (permalink / raw)
To: sdake, Discussion of clustering software components including GFS
Cc: Chris Wright, dcl_discussion, linux-kernel, cgl_discussion
Hi,
I just joined the linux-cluster list after seeing a few of the messages
that were cross-posted to linux-kernel.
On Thu, Aug 12, 2004 at 03:59:10PM -0700, Steven Dake wrote:
> Lars
>
> Thanks for posting transis. I had a look at the examples and API. The
> API is of course different then openais and focused on client/server
> architecture.
If you havn't looked at it already, you might want to try out the Spread
group communication system.
http://www.spread.org/
It is, conceptually although not code-wise, a decendant of the Transis
work (and the Totem system from UCSB) and is relatively widely used as a
production quality group messaging system (Some apache modules use it
along with a number of large web-clusters, a few commercial clustered
storage systems, and a lot of custom replication apps). It is not under
GPL but is open-source under a bsd-style (but not exactly the same)
license.
Like transis it has a client-server architecture (and a simpler API).
> I tried a performance test by sending a 64k message, and then receiving
> it 10 times with two nodes. This operation takes about 5 seconds on my
> hardware which is 128k/sec. I was expecting more like 8-10MB/sec. Is
> there anything that can be done to improve the performance?
I would expect transis to definitely do better then 128k/s given tests we
ran a number of years ago, but on upto medium sized lan environments the
totem/spread protocols are generally faster with less cpu overhead. I know
Spread could get 80Mb/s a number of years ago. We recently re-ran a clean
set of benchmarks and wrote them up. You can find them at:
http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf
I admit some bias as I'm one of the lead developers of Spread, and we (the
developers) have been building group messaging systems since the early
90's -- so I may look at things a bit differently -- so I would be very
intersted in your thoughts on how you could use GCS and whether Spread
would be useful.
Cheers,
Jonathan
--
-------------------------------------------------------
Jonathan R. Stanton jonathan@cs.jhu.edu
Dept. of Computer Science
Johns Hopkins University
-------------------------------------------------------
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-13 15:54 ` Jonathan Stanton
@ 2004-08-13 20:30 ` Lars Marowsky-Bree
2004-08-13 22:53 ` Jonathan Stanton
0 siblings, 1 reply; 9+ messages in thread
From: Lars Marowsky-Bree @ 2004-08-13 20:30 UTC (permalink / raw)
To: Jonathan Stanton, sdake,
Discussion of clustering software components including GFS
Cc: Chris Wright, dcl_discussion, cgl_discussion, linux-kernel
On 2004-08-13T11:54:41,
Jonathan Stanton <jonathan@cnds.jhu.edu> said:
> If you havn't looked at it already, you might want to try out the Spread
> group communication system.
>
> http://www.spread.org/
The intel lawyers have identified the Spread license to be
GPL-incompatible.
Otherwise, I agree, Spread is very nice. If those issues could be
resolved, that may be an interesting option too.
(I think the advertising clause and something else clash with the
(L)GPL; I can put you in contact with the Intel folks if you wish to
resolve this.)
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering \ This space /
SUSE Labs, Research and Development | intentionally |
SUSE LINUX AG - A Novell company \ left blank /
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials
2004-08-13 20:30 ` Lars Marowsky-Bree
@ 2004-08-13 22:53 ` Jonathan Stanton
0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Stanton @ 2004-08-13 22:53 UTC (permalink / raw)
To: Lars Marowsky-Bree
Cc: sdake, Discussion of clustering software components including GFS,
Chris Wright, dcl_discussion, cgl_discussion, linux-kernel
On Fri, Aug 13, 2004 at 10:30:29PM +0200, Lars Marowsky-Bree wrote:
> On 2004-08-13T11:54:41,
> Jonathan Stanton <jonathan@cnds.jhu.edu> said:
>
> > If you havn't looked at it already, you might want to try out the Spread
> > group communication system.
> >
> > http://www.spread.org/
>
> The intel lawyers have identified the Spread license to be
> GPL-incompatible.
>
> Otherwise, I agree, Spread is very nice. If those issues could be
> resolved, that may be an interesting option too.
>
> (I think the advertising clause and something else clash with the
> (L)GPL; I can put you in contact with the Intel folks if you wish to
> resolve this.)
I would appreciate that. We did choose our licensing for what I think are
good reasons, but we have also worked in the past with outside projects
with possible license conflicts and have been able to resolve them. So I
would like to understand exactly what the issues are.
Cheers,
Jonathan
--
-------------------------------------------------------
Jonathan R. Stanton jonathan@cs.jhu.edu
Dept. of Computer Science
Johns Hopkins University
-------------------------------------------------------
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-08-13 22:54 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <3689AF909D816446BA505D21F1461AE4C75110@cacexc04.americas.cpqcorp.net>
2004-08-11 18:46 ` [Linux-cluster] Re: [cgl_discussion] Re: [dcl_discussion] Clustersummit materials Steven Dake
2004-08-12 9:57 ` Lars Marowsky-Bree
2004-08-12 17:42 ` Steven Dake
2004-08-12 20:37 ` Lars Marowsky-Bree
2004-08-12 22:59 ` Steven Dake
2004-08-13 9:40 ` Lars Marowsky-Bree
2004-08-13 15:54 ` Jonathan Stanton
2004-08-13 20:30 ` Lars Marowsky-Bree
2004-08-13 22:53 ` Jonathan Stanton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox