linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Caulfield <pcaulfie@redhat.com>
To: linux clustering <linux-cluster@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>,
	ak@suse.de, linux-fsdevel@vger.kernel.org,
	Joel.Becker@oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [Linux-cluster] Re: GFS, what's remaining
Date: Wed, 14 Sep 2005 10:01:23 +0100	[thread overview]
Message-ID: <4327E6E3.3050501@redhat.com> (raw)
In-Reply-To: <1125922894.8714.14.camel@localhost.localdomain>

I've just returned from holiday so I'm late to this discussion so let me tell
you what we do now and why and lets see what's wrong with it.

Currently the library create_lockspace() call returns an FD upon which all lock
operations happen. The FD is onto a misc device, one per lockspace, so if you
want lockspace protection it can happen at that level. There is no protection
applied to locks within a lockspace nor do I think it's helpful to do so to be
honest. Using a misc device limits you to <255 lockspaces depending on the other
uses of misc but this is just for userland-visible lockspace - it does not
affect GFS filesystems for instance.

Lock/convert/unlock operations are done using write calls on that lockspace FD.
Callbacks are implemented using poll and read on the FD, read will return data
blocks (one per callback) as long as there are active callbacks to process. The
current read functionality behaves more like a SOCK_PACKET than a data stream
which some may not like but then you're going to need to know what you're
reading from the device anyway.

ioctl/fcntl isn't really useful for DLM locks because you can't do asynchronous
operations on them - the lock has to succeed or fail in the one operation - if
you want a callback for completion (or blocking notification) you have to poll
the lockspace FD anyway and then you might as well go back to using read and
write because at least they are something of a matched pair. Something similar
applies, I think, to a syscall interface.

Another reason the existing fcntl interface isn't appropriate is that it's not
locking the same kind of thing. Current Unix fcntl calls lock byte ranges. DLM
locks arbitrary names and has a much richer list of lock modes. Adding another
fcntl just runs in the problems mentioned above.

The other reason we use read for callbacks is that there is information to be
passed back: lock status, value block and (possibly) query information.

While having an FD per lock sounds like a nice unixy idea I don't think it would
work very well in practice. Applications with hundreds or thousands of locks
(such as databases) would end up with huge pollfd structs to manage, and it
while it helps the refcounting (currently the nastiest bit of the current
dlm_device code) removes the possibility of having persistent locks that exist
after the process exits - a handy feature that some people do use, though I
don't think it's in the currently submitted DLM code. One FD per lock also gives
each lock two handles, the lock ID used internally by the DLM and the FD used
externally by the application which I think is a little confusing.

I don't think a dlmfs is useful, personally. The features you can export from it
are either minimal compared to the full DLM functionality (so you have to export
the rest by some other means anyway) or are going to be so un-filesystemlike as
to be very awkward to use. Doing lock operations in shell scripts is all very
cool but how often do you /really/ need to do that?

I'm not saying that what we have is perfect - far from it - but we have thought
about how this works and what we came up with seems like a good compromise
between providing full DLM functionality to userspace using unix features. But
we're very happy to listen to other ideas - and have been doing I hope.

-- 

patrick

  parent reply	other threads:[~2005-09-14  9:01 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-01 10:46 GFS, what's remaining David Teigland
2005-09-01 10:42 ` Arjan van de Ven
2005-09-01 10:59 ` Andrew Morton
2005-09-01 14:49   ` Alan Cox
2005-09-01 14:27     ` Christoph Hellwig
2005-09-01 15:28       ` Alan Cox
2005-09-01 15:11         ` Lars Marowsky-Bree
2005-09-01 17:56         ` Christoph Hellwig
2005-09-02  7:04           ` David Teigland
2005-09-01 17:23     ` Daniel Phillips
2005-09-01 20:21     ` Andrew Morton
2005-09-02 21:17       ` Andi Kleen
2005-09-02 23:03         ` Bryan Henderson
2005-09-03  0:16         ` Mark Fasheh
2005-09-03  6:42           ` Daniel Phillips
2005-09-03  6:46             ` Wim Coekaerts
2005-09-03 22:21               ` Daniel Phillips
2005-09-04  1:09                 ` [Linux-cluster] " Joel Becker
2005-09-04  1:32                   ` Andrew Morton
2005-09-04  3:06                     ` Joel Becker
2005-09-04  4:22                       ` [Linux-cluster] " Daniel Phillips
2005-09-04  4:30                         ` Joel Becker
2005-09-04  4:51                           ` Daniel Phillips
2005-09-04  5:00                             ` Joel Becker
2005-09-04  5:52                               ` [Linux-cluster] " Daniel Phillips
2005-09-04  5:56                                 ` Joel Becker
2005-09-04  4:46                         ` Andrew Morton
2005-09-04  4:58                           ` Joel Becker
2005-09-04  5:41                             ` Andrew Morton
2005-09-04  5:49                               ` Joel Becker
2005-09-05  4:30                               ` David Teigland
2005-09-05  8:54                                 ` [Linux-cluster] " Andrew Morton
2005-09-05  9:24                                   ` David Teigland
2005-09-05  9:19                                     ` [Linux-cluster] " Andrew Morton
2005-09-05  9:30                                       ` Daniel Phillips
2005-09-05  9:48                                       ` David Teigland
2005-09-05 12:21                                       ` Alan Cox
2005-09-05 19:53                                         ` [Linux-cluster] " Andrew Morton
2005-09-05 23:20                                           ` Alan Cox
2005-09-05 23:06                                             ` Andrew Morton
2005-09-14  9:01                                         ` Patrick Caulfield [this message]
2005-09-05 19:11                                     ` [Linux-cluster] " kurt.hackel
2005-09-04  6:10                           ` Mark Fasheh
2005-09-04  7:23                             ` Andrew Morton
2005-09-04  8:17                               ` Mark Fasheh
2005-09-04  8:37                                 ` Andrew Morton
2005-09-04  6:40                           ` [Linux-cluster] " Daniel Phillips
2005-09-04  7:28                             ` Andrew Morton
2005-09-04  8:01                               ` [Linux-cluster] " Joel Becker
2005-09-04  8:18                                 ` Andrew Morton
2005-09-04  9:11                                   ` Joel Becker
2005-09-04  9:18                                     ` [Linux-cluster] " Andrew Morton
2005-09-04  9:39                                       ` Joel Becker
2005-09-04 18:03                                     ` [Linux-cluster] " Hua Zhong
2005-09-04 19:51                               ` Daniel Phillips
2005-09-04  7:12                           ` Hua Zhong
2005-09-04  8:37                           ` Alan Cox
2005-09-05 23:32                             ` Joel Becker
2005-09-03  5:57         ` Daniel Phillips
2005-09-05 14:14           ` Lars Marowsky-Bree
2005-09-05 15:49             ` Daniel Phillips
2005-09-05 16:18               ` Dmitry Torokhov
2005-09-06  0:57                 ` Daniel Phillips
2005-09-06  2:03                   ` Dmitry Torokhov
2005-09-06  4:02                     ` Daniel Phillips
2005-09-06  4:07                       ` GFS, what's remainingh Dmitry Torokhov
2005-09-06  4:58                         ` Daniel Phillips
2005-09-06  5:05                           ` Dmitry Torokhov
2005-09-06  6:48                             ` Daniel Phillips
2005-09-06  6:55                               ` Dmitry Torokhov
2005-09-06  7:18                                 ` Daniel Phillips
2005-09-06 14:31                                   ` Dmitry Torokhov
2005-09-06 13:42                               ` Alan Cox
2005-09-03  7:06         ` GFS, what's remaining Wim Coekaerts
2005-09-06 12:55         ` Suparna Bhattacharya
2005-09-03  5:18       ` David Teigland
2005-09-03  6:14         ` Arjan van de Ven
2005-09-03  6:42           ` D. Hazelton
2005-09-03 10:35           ` David Teigland
2005-09-03 20:56             ` Daniel Phillips
2005-09-04 20:33         ` Pavel Machek
2005-09-04 22:18           ` Joel Becker
2005-09-05  5:54           ` Theodore Ts'o
2005-09-05  7:09             ` Mark Fasheh
2005-09-05 14:07               ` Theodore Ts'o
2005-09-05  8:27             ` real read-only [was Re: GFS, what's remaining] Pavel Machek
2005-09-05 14:03               ` Theodore Ts'o
2005-09-05 10:44           ` Re: GFS, what's remaining Stephen C. Tweedie
2005-09-05 16:41             ` Greg Freemyer
2005-09-01 11:35 ` Arjan van de Ven
2005-09-02  9:44   ` David Teigland
2005-09-02 11:46     ` Jörn Engel
2005-09-03  5:28     ` Greg KH
2005-09-05  3:47       ` David Teigland
2005-09-05  8:58         ` Jörn Engel
2005-09-05  9:18           ` David Teigland
2005-09-05  5:43   ` David Teigland
2005-09-05  6:32     ` Pekka Enberg
2005-09-05  7:55       ` David Teigland
2005-09-05  8:00         ` Pekka Enberg
2005-09-10 10:11     ` Arjan van de Ven
2005-09-05  6:29   ` David Teigland
2005-09-08  5:41   ` David Teigland
2005-09-01 12:33 ` Pekka Enberg
2005-09-01 17:27 ` Daniel Phillips
  -- strict thread matches above, loose matches on Subject: below --
2005-09-01 18:47 [Linux-cluster] " Hua Zhong (hzhong)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4327E6E3.3050501@redhat.com \
    --to=pcaulfie@redhat.com \
    --cc=Joel.Becker@oracle.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-cluster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).