All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'Jeff Layton'" <jlayton@redhat.com>
Cc: <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<nfs-ganesha-devel@lists.sourceforge.net>,
	<samba-technical@lists.samba.org>
Subject: RE: [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks
Date: Fri, 11 Oct 2013 11:53:30 -0700	[thread overview]
Message-ID: <020301cec6b3$30aa5d00$91ff1700$@mindspring.com> (raw)
In-Reply-To: <20131011144240.007f96c1@tlielax.poochiereds.net>

> > > > > > At LSF this year, there was a discussion about the "wishlist"
> > > > > > for userland file servers. One of the things brought up was
> > > > > > the goofy and problematic behavior of POSIX locks when a file is
> closed.
> > > > > > Boaz started a thread on it here:
> > > > > >
> > > > > >     http://permalink.gmane.org/gmane.linux.file-systems/73364
> > > > > >
> > > > > > Userland fileservers often need to maintain more than one open
> > > > > > file descriptor on a file. The POSIX spec says:
> > > > > >
> > > > > > "All locks associated with a file for a given process shall be
> > > > > > removed when a file descriptor for that file is closed by that
> > > > > > process or the process holding that file descriptor terminates."
> > > > > >
> > > > > > This is problematic since you can't close any file descriptor
> > > > > > without dropping all your POSIX locks. Most userland file
> > > > > > servers therefore end up opening the file with more access
> > > > > > than is really necessary, and keeping fd's open for longer
> > > > > > than is necessary to
> > work
> > > around this.
> > > > > >
> > > > > > This patchset is a first stab at an approach to address this
> > > > > > problem by adding two new l_type values -- F_RDLCKP and
> > > > > > F_WRLCKP (the 'P' is short for "private" -- I'm open to
> > > > > > changing that if you have a better mnemonic).
> > > > > >
> > > > > > For all intents and purposes these lock types act just like
> > > > > > their "non-P" counterpart. The difference is that they are
> > > > > > only implicitly released when the fd against which they were
> > > > > > acquired is closed. As a side effect, these locks cannot be
> > > > > > merged with "non-P" locks since they have different semantics on
> close.
> > > > > >
> > > > > > I've given this patchset some very basic smoke testing and it
> > > > > > seems to do the right thing, but it is still pretty rough. If
> > > > > > this looks reasonable I'll plan to do some documentation
> > > > > > updates and will take a stab at trying to get these new lock
> > > > > > types added to the POSIX spec (as HCH recommended).
> > > > > >
> > > > > > At this point, my main questions are:
> > > > > >
> > > > > > 1) does this look useful, particularly for fileserver
implementors?
> > > > > >
> > > > > > 2) does this look OK API-wise? We could consider different "cmd"
> > > values
> > > > > >    or even different syscalls, but I figured this makes it
> > > > > > clearer
> > that
> > > > > >    "P" and "non-P" locks will still conflict with one another.
> > > >
> > > > This is a good start.
> > > >
> > > > I'd prefer a model where the private locks are maintained even if
> > > > all file descriptors are closed and released on garbage collection
> > > > when the process terminates. The model presented would require a
> > > > server to potentially have at least two file descriptors open (the
> > > > descriptor originally used for the locks, and a descriptor used
> > > > for current access mode needed for some I/O operation). The server
> > > > will also need to "remember" to do all locks using the first file
> descriptor.
> > > >
> > >
> > > That's sort of a non-starter, I think at least in Linux. If you have
> > > no
> > open file
> > > descriptor then you have nothing to hang the lock off of.
> > > That sort of interface sounds error-prone and "leaky" too. A long
> > > running process could easily end up leaking POSIX locks over time if
> > > you forget to explicitly unlock them.
> >
> > There is a point there, however see below for discussion of file
> > descriptor resources.
> >
> > > > Another thing that would be very useful for servers is to be able
> > > > to specify an arbitrary lock owner. Currently, Ganesha has to
> > > > manage a union of all locks held on a file and carefully pick it
> > > > apart when a client does an unlock. Allowing a process specified
> > > > owner would allow Ganesha (or other
> > > > servers) to have separate locks for each client lock owner.
> > > >
> > >
> > > The trivial answer there would be to give each lockowner its own
> > > file descriptor, right?
> >
> > Hmm, that would be a solution (of course that would imply that private
> > locks held by the same process but by different file descriptors would
> > conflict appropriately).
> >
> 
> Good point. In the implementation I have so far, POSIX locks held by the
> same process don't conflict, just like normal POSIX locks do. For these
sorts
> of locks, I think it would make sense to have more flock()-like behavior
there,
> such that locks held by the same process on different file descriptors
will still
> conflict. I'll plan to make that change on the next pass.

Yea, and I think the "private" being part of the descriptor of these locks
will make those semantics make sense. These locks are private to the
particular file descriptor.

> > There is a resource issue though of how many file descriptors we have
> open.
> > Is there any practical limit on the number of file descriptors a
> > process has open? Can the kernel support 1000s of descriptors? How
> > much resource does a file descriptor take? Looks like a struct file
> > isn't tiny, not quite sure just how big it is.
> >
> > There is also some consideration of how this interacts with share
> > reservations (where is that proposal going BTW?). But I don't think
> > this really introduces anything new. We still have to guess the best
> > access mode to open a file descriptor that will be used for locks no
> > matter how we implement this.
> >
> 
> At least in the currently proposed patchset by Pavel, share reservations
are
> orthogonal to these since they're based on LOCK_MAND
> flock() locks.

Sure, and I don't think they'll cause an issue. Hmm, what about delegations?
An out of tree file system I used to be associated with had issues with
Ganesha's opening files read/write because of the POSIX behavior caused fits
with delegations.

> > So I guess my big concern is the resource impact of lots of file
> > descriptors.
> >
> 
> That's understandable. I'm not clear on how big an overhead there is on
> maintaining an open file descriptor. OTOH, people use flock() and such and
it
> has similar requirements.

Let's mull on that some more. Does anyone have a quick answer to the amount
of memory used by an open file descriptor, and the related question of how
many open file descriptors it's reasonable for one process to have?

> I guess my main concern is that while I'm interested in adding interfaces
that
> make it _easier_ to implement fileservers, I'm not terribly interested in
> adding interfaces that are _specific_ to implementing them.
> 
> Whatever interface we add needs to be generic enough to be useful to other
> applications as well. The changes you're suggesting sound rather specific
to a
> particular use-case.

Sure, understood, though the Samba folks may have some input here also
(though at least they have a process per connection still right? So the
trouble would be a client with lots of processes running on it, not lots of
clients running one (or a few) process each.

Frank

  reply	other threads:[~2013-10-11 18:53 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-11 12:25 [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks Jeff Layton
2013-10-11 12:25 ` Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 1/5] locks: consolidate checks for compatible filp->f_mode values in setlk handlers Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 2/5] locks: add definitions for F_RDLCKP and F_WRLCKP Jeff Layton
2013-10-11 12:25   ` Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 3/5] locks: skip FL_FILP_PRIVATE locks on close unless we're closing the correct filp Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 4/5] locks: handle merging of locks when FL_FILP_PRIVATE is set Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 5/5] locks: show private lock types in /proc/locks Jeff Layton
2013-10-11 13:35 ` [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks Jeff Layton
2013-10-11 13:35   ` Jeff Layton
2013-10-11 15:20   ` Frank Filz
2013-10-11 15:50     ` Jeff Layton
2013-10-11 15:50       ` Jeff Layton
2013-10-11 17:07       ` Frank Filz
2013-10-11 18:42         ` Jeff Layton
2013-10-11 18:42           ` Jeff Layton
2013-10-11 18:53           ` Frank Filz [this message]
2013-10-12  9:10             ` Volker Lendecke
2013-10-12  9:10               ` Volker Lendecke
2013-10-11 20:07 ` J. Bruce Fields
2013-10-11 20:07   ` J. Bruce Fields
2013-10-11 21:36   ` Andreas Dilger
2013-10-11 21:36     ` Andreas Dilger
2013-10-11 23:21     ` Jeff Layton
2013-10-11 23:21       ` Jeff Layton
2013-10-11 23:49       ` Jeremy Allison
2013-10-12  0:18         ` Scott Lovenberg
2013-10-12  0:18           ` Scott Lovenberg
2013-10-12  0:42           ` Jeff Layton
2013-10-12  0:42             ` Jeff Layton
2013-10-12 18:12             ` Frank Filz
2013-10-14  7:24               ` Volker Lendecke
2013-10-14 15:23                 ` Frank Filz
2013-10-15  8:56                   ` Volker Lendecke
2013-10-15  8:56                     ` Volker Lendecke
2013-10-12 20:56             ` Scott Lovenberg
2013-10-12  9:20 ` Stefan (metze) Metzmacher
2013-10-12 11:47   ` Jeff Layton
2013-10-12 18:10     ` [Nfs-ganesha-devel] " Frank Filz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='020301cec6b3$30aa5d00$91ff1700$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nfs-ganesha-devel@lists.sourceforge.net \
    --cc=samba-technical@lists.samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.