All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Dake <sdake@mvista.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: Daniel Phillips <phillips@redhat.com>,
	David Teigland <teigland@redhat.com>,
	linux-kernel@vger.kernel.org, Lars Marowsky-Bree <lmb@suse.de>
Subject: Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30
Date: Sat, 10 Jul 2004 10:59:18 -0700	[thread overview]
Message-ID: <1089482358.19787.14.camel@persist.az.mvista.com> (raw)
In-Reply-To: <200407100058.28599.phillips@arcor.de>

Comments inline thanks
-steve

On Fri, 2004-07-09 at 21:58, Daniel Phillips wrote:
> Hi Steven,
> 
> On Thursday 08 July 2004 15:41, Steven Dake wrote:
> > On Thu, 2004-07-08 at 11:22, Daniel Phillips wrote:
> > > While we're in here, could you please explain why CMAN needs to be
> > > kernel-based?  (Just thought I'd broach the question before Christoph
> > > does.)
> >
> > Daniel,
> >
> > I have that same question as well.  I can think of several
> > disadvantages:
> >
> > 1) security faults in the protocol can crash the kernel or violate
> >     system security
> > 2) secure group communication is difficult to implement in kernel
> >     - secure group key protocols can be implemented fairly easily in
> >        userspace using packages like openssl.  Implementing these
> >        protocols in kernel will prove to be very complex.
> > 3) live upgrades are much more difficult with kernel components
> > 4) a standard interface (the SA Forum AIS) is not being used,
> >     disallowing replaceability of components.  This is a big deal for
> >     people interested in clustering that dont want to be locked into
> >     a partciular implementation.
> > 5) dlm, fencing, cluster messaging (including membership) can be done
> >     in userspace, so why not do it there.
> > 6) cluster services for the kernel and cluster services for applications
> >     will fork, because SA Forum AIS will be chosen for application
> >    level services.
> > 7) faults in the protocols can bring down all of Linux, instead of one
> >     cluster service on one node.
> > 8) kernel changes require much longer to get into the field and are
> >    much more difficult to distribute.  userspace applications are much
> >    simpler to unit test, qualify, and release.
> >
> > The advantages are:
> > interrupt driven timers
> > some possible reduction in latency related to the cost of executing a
> > system call when sending messages (including lock messages)
> 
> I'm not saying you're wrong, but I can think of an advantage you didn't 
> mention: a service living in kernel will inherit the PF_MEMALLOC state of the 
> process that called it, that is, a VM cache flushing task.  A userspace 
> service will not.  A cluster block device in kernel may need to invoke some 
> service in userspace at an inconvenient time.
> 
> For example, suppose somebody spills coffee into a network node while another 
> network node is in PF_MEMALLOC state, busily trying to write out dirty file 
> data to it.  The kernel block device now needs to yell to the user space 
> service to go get it a new network connection.  But the userspace service may 
> need to allocate some memory to do that, and, whoops, the kernel won't give 
> it any because it is in PF_MEMALLOC state.  Now what?
> 

overload conditions that have caused the kernel to run low on memory are
a difficult problem, even for kernel components.  Currently openais
includes "memory pools" which preallocate data structures.  While that
work is not yet complete, the intent is to ensure every data area is
preallocated so the openais executive (the thing that does all of the
work) doesn't ever request extra memory once it becomes operational.

This of course, leads to problems in the following system calls which
openais uses extensively:
sys_poll
sys_recvmsg
sys_sendmsg

which require the allocations of memory with GFP_KERNEL, which can then
fail returning ENOMEM to userland.  The openais protocol currently can
handle low memory failures in recvmsg and sendmsg.  This is because it
uses a protocol designed to operate on lossy networks.

The poll system call problem will be rectified by utilizing
sys_epoll_wait which does not allocate any memory (the poll data is
preallocated).

I hope that helps atleast answer that some r&d is underway to solve this
particular overload problem in userspace.

> > One of these projects, the openais project which I maintain, implements
> > 3 of these services (and the rest will be done in the timeframes we are
> > talking about) in user space without any kernel changes required.  It
> > would be possible with kernel to userland communication for the cluster
> > applications (GFS, distributed block device, etc) to use this standard
> > interface and implementation.  Then we could avoid all of the
> > unnecessary kernel maintenance and potential problems that come along
> > with it.
> >
> > Are you interested in such an approach?
> 
> We'd be remiss not to be aware of it, and its advantages.  It seems your 
> project is still in early stages.  How about we take pains to ensure that 
> your cluster membership service is plugable into the CMAN infrastructure, as 
> a starting point.
> 
sounds good

> Though I admit I haven't read through the whole code tree, there doesn't seem 
> to be a distributed lock manager there.  Maybe that is because it's so 
> tightly coded I missed it?
> 

There is as of yet no implementation of the SAF AIS dlock API in
openais.  The work requires about 4 weeks of development for someone
well-skilled.  I'd expect a contribution for this API in the timeframes
that make GFS interesting.

I'd invite you, or others interested in these sorts of services, to
contribute that code, if interested.  If interested in developing such a
service for openais, check out the developer's map (which describes
developing a service for openais) at:

http://developer.osdl.org/dev/openais/src/README.devmap

Thanks!
-steve

> Regards,
> 
> Daniel


  reply	other threads:[~2004-07-10 17:59 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-05  6:09 [ANNOUNCE] Minneapolis Cluster Summit, July 29-30 Daniel Phillips
2004-07-05 15:09 ` Christoph Hellwig
2004-07-05 18:42   ` Daniel Phillips
2004-07-05 19:08     ` Chris Friesen
2004-07-05 20:29       ` Daniel Phillips
2004-07-07 22:55         ` Steven Dake
2004-07-08  1:30           ` Daniel Phillips
2004-07-05 19:12     ` Lars Marowsky-Bree
2004-07-05 20:27       ` Daniel Phillips
2004-07-06  7:34         ` Lars Marowsky-Bree
2004-07-06 21:34           ` Daniel Phillips
2004-07-07 18:16             ` Lars Marowsky-Bree
2004-07-08  1:14               ` Daniel Phillips
2004-07-08  9:10                 ` Lars Marowsky-Bree
2004-07-08 10:53                   ` David Teigland
2004-07-08 14:14                     ` Chris Friesen
2004-07-08 16:06                       ` David Teigland
2004-07-08 18:22                     ` Daniel Phillips
2004-07-08 19:41                       ` Steven Dake
2004-07-10  4:58                         ` David Teigland
2004-07-10  4:58                         ` Daniel Phillips
2004-07-10 17:59                           ` Steven Dake [this message]
2004-07-10 20:57                             ` Daniel Phillips
2004-07-10 23:24                               ` Steven Dake
2004-07-11 19:44                                 ` Daniel Phillips
2004-07-11 21:06                                   ` Lars Marowsky-Bree
2004-07-12  6:58                                     ` Arjan van de Ven
2004-07-12 10:05                                       ` Lars Marowsky-Bree
2004-07-12 10:11                                         ` Arjan van de Ven
2004-07-12 10:21                                           ` Lars Marowsky-Bree
2004-07-12 10:28                                             ` Arjan van de Ven
2004-07-12 11:50                                               ` Lars Marowsky-Bree
2004-07-12 12:01                                                 ` Arjan van de Ven
2004-07-12 13:13                                                   ` Lars Marowsky-Bree
2004-07-12 13:40                                                     ` Nick Piggin
2004-07-12 20:54                                                       ` Andrew Morton
2004-07-13  2:19                                                         ` Daniel Phillips
2004-07-13  2:31                                                           ` Nick Piggin
2004-07-27  3:31                                                             ` Daniel Phillips
2004-07-27  4:07                                                               ` Nick Piggin
2004-07-27  5:57                                                                 ` Daniel Phillips
2004-07-14 12:19                                                         ` Pavel Machek
2004-07-15  2:19                                                           ` Nick Piggin
2004-07-15 12:03                                                             ` Marcelo Tosatti
2004-07-14  8:32                                             ` Pavel Machek
2004-07-12  4:08                                   ` Steven Dake
2004-07-12  4:23                                     ` Daniel Phillips
2004-07-12 18:21                                       ` Steven Dake
2004-07-12 19:54                                         ` Daniel Phillips
2004-07-13 20:06                                         ` Pavel Machek
2004-07-12 10:14                     ` Lars Marowsky-Bree
     [not found] <fa.io9lp90.1c02foo@ifi.uio.no>
     [not found] ` <fa.go9f063.1i72joh@ifi.uio.no>
2004-07-06  6:39   ` Aneesh Kumar K.V
  -- strict thread matches above, loose matches on Subject: below --
2004-07-10 14:58 James Bottomley
2004-07-10 16:04 ` David Teigland
2004-07-10 16:26   ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1089482358.19787.14.camel@persist.az.mvista.com \
    --to=sdake@mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lmb@suse.de \
    --cc=phillips@arcor.de \
    --cc=phillips@redhat.com \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.