public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@redhat.com>
To: sdake@mvista.com
Cc: Daniel Phillips <phillips@arcor.de>,
	David Teigland <teigland@redhat.com>,
	linux-kernel@vger.kernel.org, Lars Marowsky-Bree <lmb@suse.de>
Subject: Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30
Date: Sat, 10 Jul 2004 16:57:06 -0400	[thread overview]
Message-ID: <200407101657.06314.phillips@redhat.com> (raw)
In-Reply-To: <1089482358.19787.14.camel@persist.az.mvista.com>

On Saturday 10 July 2004 13:59, Steven Dake wrote:
> > I'm not saying you're wrong, but I can think of an advantage you
> > didn't mention: a service living in kernel will inherit the
> > PF_MEMALLOC state of the process that called it, that is, a VM
> > cache flushing task.  A userspace service will not.  A cluster
> > block device in kernel may need to invoke some service in userspace
> > at an inconvenient time.
> >
> > For example, suppose somebody spills coffee into a network node
> > while another network node is in PF_MEMALLOC state, busily trying
> > to write out dirty file data to it.  The kernel block device now
> > needs to yell to the user space service to go get it a new network
> > connection.  But the userspace service may need to allocate some
> > memory to do that, and, whoops, the kernel won't give it any
> > because it is in PF_MEMALLOC state.  Now what?
>
> overload conditions that have caused the kernel to run low on memory
> are a difficult problem, even for kernel components.  Currently
> openais includes "memory pools" which preallocate data structures. 
> While that work is not yet complete, the intent is to ensure every
> data area is preallocated so the openais executive (the thing that
> does all of the work) doesn't ever request extra memory once it
> becomes operational.
>
> This of course, leads to problems in the following system calls which
> openais uses extensively:
> sys_poll
> sys_recvmsg
> sys_sendmsg
>
> which require the allocations of memory with GFP_KERNEL, which can
> then fail returning ENOMEM to userland.  The openais protocol
> currently can handle low memory failures in recvmsg and sendmsg. 
> This is because it uses a protocol designed to operate on lossy
> networks.
>
> The poll system call problem will be rectified by utilizing
> sys_epoll_wait which does not allocate any memory (the poll data is
> preallocated).

But if the user space service is sitting in the kernel's dirty memory 
writeout path, you have a real problem: the low memory condition may 
never get resolved, rendering your userspace service autistic.  
Meanwhile, whoever is generating the dirty memory just keeps spinning 
and spinning, generating more of it, ensuring that if the system does 
survive the first incident, there's another, worse traffic jam coming 
down the pipe.  To trigger this deadlock, a kernel filesystem or block 
device module just has to lose its cluster connection(s) at the wrong 
time.

> I hope that helps atleast answer that some r&d is underway to solve
> this particular overload problem in userspace.

I'm certain there's a solution, but until it is demonstrated and proved, 
any userspace cluster services must be regarded with narrow squinty 
eyes.

> > Though I admit I haven't read through the whole code tree, there
> > doesn't seem to be a distributed lock manager there.  Maybe that is
> > because it's so tightly coded I missed it?
>
> There is as of yet no implementation of the SAF AIS dlock API in
> openais.  The work requires about 4 weeks of development for someone
> well-skilled.  I'd expect a contribution for this API in the
> timeframes that make GFS interesting.

I suspect you have underestimated the amount of development time 
required.

> I'd invite you, or others interested in these sorts of services, to
> contribute that code, if interested.

Humble suggestion: try grabbing the Red Hat (Sistina) DLM code and see 
if you can hack it to do what you want.  Just write a kernel module 
that exports the DLM interface to userspace in the desired form.

   http://sources.redhat.com/cluster/dlm/

Regards,

Daniel

  reply	other threads:[~2004-07-10 20:49 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-05  6:09 [ANNOUNCE] Minneapolis Cluster Summit, July 29-30 Daniel Phillips
2004-07-05 15:09 ` Christoph Hellwig
2004-07-05 18:42   ` Daniel Phillips
2004-07-05 19:08     ` Chris Friesen
2004-07-05 20:29       ` Daniel Phillips
2004-07-07 22:55         ` Steven Dake
2004-07-08  1:30           ` Daniel Phillips
2004-07-05 19:12     ` Lars Marowsky-Bree
2004-07-05 20:27       ` Daniel Phillips
2004-07-06  7:34         ` Lars Marowsky-Bree
2004-07-06 21:34           ` Daniel Phillips
2004-07-07 18:16             ` Lars Marowsky-Bree
2004-07-08  1:14               ` Daniel Phillips
2004-07-08  9:10                 ` Lars Marowsky-Bree
2004-07-08 10:53                   ` David Teigland
2004-07-08 14:14                     ` Chris Friesen
2004-07-08 16:06                       ` David Teigland
2004-07-08 18:22                     ` Daniel Phillips
2004-07-08 19:41                       ` Steven Dake
2004-07-10  4:58                         ` David Teigland
2004-07-10  4:58                         ` Daniel Phillips
2004-07-10 17:59                           ` Steven Dake
2004-07-10 20:57                             ` Daniel Phillips [this message]
2004-07-10 23:24                               ` Steven Dake
2004-07-11 19:44                                 ` Daniel Phillips
2004-07-11 21:06                                   ` Lars Marowsky-Bree
2004-07-12  6:58                                     ` Arjan van de Ven
2004-07-12 10:05                                       ` Lars Marowsky-Bree
2004-07-12 10:11                                         ` Arjan van de Ven
2004-07-12 10:21                                           ` Lars Marowsky-Bree
2004-07-12 10:28                                             ` Arjan van de Ven
2004-07-12 11:50                                               ` Lars Marowsky-Bree
2004-07-12 12:01                                                 ` Arjan van de Ven
2004-07-12 13:13                                                   ` Lars Marowsky-Bree
2004-07-12 13:40                                                     ` Nick Piggin
2004-07-12 20:54                                                       ` Andrew Morton
2004-07-13  2:19                                                         ` Daniel Phillips
2004-07-13  2:31                                                           ` Nick Piggin
2004-07-27  3:31                                                             ` Daniel Phillips
2004-07-27  4:07                                                               ` Nick Piggin
2004-07-27  5:57                                                                 ` Daniel Phillips
2004-07-14 12:19                                                         ` Pavel Machek
2004-07-15  2:19                                                           ` Nick Piggin
2004-07-15 12:03                                                             ` Marcelo Tosatti
2004-07-14  8:32                                             ` Pavel Machek
2004-07-12  4:08                                   ` Steven Dake
2004-07-12  4:23                                     ` Daniel Phillips
2004-07-12 18:21                                       ` Steven Dake
2004-07-12 19:54                                         ` Daniel Phillips
2004-07-13 20:06                                         ` Pavel Machek
2004-07-12 10:14                     ` Lars Marowsky-Bree
     [not found] <fa.io9lp90.1c02foo@ifi.uio.no>
     [not found] ` <fa.go9f063.1i72joh@ifi.uio.no>
2004-07-06  6:39   ` Aneesh Kumar K.V
  -- strict thread matches above, loose matches on Subject: below --
2004-07-10 14:58 James Bottomley
2004-07-10 16:04 ` David Teigland
2004-07-10 16:26   ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200407101657.06314.phillips@redhat.com \
    --to=phillips@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lmb@suse.de \
    --cc=phillips@arcor.de \
    --cc=sdake@mvista.com \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox