linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'Jeff Layton'" <jlayton@redhat.com>
Cc: <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<nfs-ganesha-devel@lists.sourceforge.net>,
	<samba-technical@lists.samba.org>
Subject: RE: [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks
Date: Fri, 11 Oct 2013 11:53:30 -0700	[thread overview]
Message-ID: <020301cec6b3$30aa5d00$91ff1700$@mindspring.com> (raw)
In-Reply-To: <20131011144240.007f96c1@tlielax.poochiereds.net>

> > > > > > At LSF this year, there was a discussion about the "wishlist"
> > > > > > for userland file servers. One of the things brought up was
> > > > > > the goofy and problematic behavior of POSIX locks when a file is
> closed.
> > > > > > Boaz started a thread on it here:
> > > > > >
> > > > > >     http://permalink.gmane.org/gmane.linux.file-systems/73364
> > > > > >
> > > > > > Userland fileservers often need to maintain more than one open
> > > > > > file descriptor on a file. The POSIX spec says:
> > > > > >
> > > > > > "All locks associated with a file for a given process shall be
> > > > > > removed when a file descriptor for that file is closed by that
> > > > > > process or the process holding that file descriptor terminates."
> > > > > >
> > > > > > This is problematic since you can't close any file descriptor
> > > > > > without dropping all your POSIX locks. Most userland file
> > > > > > servers therefore end up opening the file with more access
> > > > > > than is really necessary, and keeping fd's open for longer
> > > > > > than is necessary to
> > work
> > > around this.
> > > > > >
> > > > > > This patchset is a first stab at an approach to address this
> > > > > > problem by adding two new l_type values -- F_RDLCKP and
> > > > > > F_WRLCKP (the 'P' is short for "private" -- I'm open to
> > > > > > changing that if you have a better mnemonic).
> > > > > >
> > > > > > For all intents and purposes these lock types act just like
> > > > > > their "non-P" counterpart. The difference is that they are
> > > > > > only implicitly released when the fd against which they were
> > > > > > acquired is closed. As a side effect, these locks cannot be
> > > > > > merged with "non-P" locks since they have different semantics on
> close.
> > > > > >
> > > > > > I've given this patchset some very basic smoke testing and it
> > > > > > seems to do the right thing, but it is still pretty rough. If
> > > > > > this looks reasonable I'll plan to do some documentation
> > > > > > updates and will take a stab at trying to get these new lock
> > > > > > types added to the POSIX spec (as HCH recommended).
> > > > > >
> > > > > > At this point, my main questions are:
> > > > > >
> > > > > > 1) does this look useful, particularly for fileserver
implementors?
> > > > > >
> > > > > > 2) does this look OK API-wise? We could consider different "cmd"
> > > values
> > > > > >    or even different syscalls, but I figured this makes it
> > > > > > clearer
> > that
> > > > > >    "P" and "non-P" locks will still conflict with one another.
> > > >
> > > > This is a good start.
> > > >
> > > > I'd prefer a model where the private locks are maintained even if
> > > > all file descriptors are closed and released on garbage collection
> > > > when the process terminates. The model presented would require a
> > > > server to potentially have at least two file descriptors open (the
> > > > descriptor originally used for the locks, and a descriptor used
> > > > for current access mode needed for some I/O operation). The server
> > > > will also need to "remember" to do all locks using the first file
> descriptor.
> > > >
> > >
> > > That's sort of a non-starter, I think at least in Linux. If you have
> > > no
> > open file
> > > descriptor then you have nothing to hang the lock off of.
> > > That sort of interface sounds error-prone and "leaky" too. A long
> > > running process could easily end up leaking POSIX locks over time if
> > > you forget to explicitly unlock them.
> >
> > There is a point there, however see below for discussion of file
> > descriptor resources.
> >
> > > > Another thing that would be very useful for servers is to be able
> > > > to specify an arbitrary lock owner. Currently, Ganesha has to
> > > > manage a union of all locks held on a file and carefully pick it
> > > > apart when a client does an unlock. Allowing a process specified
> > > > owner would allow Ganesha (or other
> > > > servers) to have separate locks for each client lock owner.
> > > >
> > >
> > > The trivial answer there would be to give each lockowner its own
> > > file descriptor, right?
> >
> > Hmm, that would be a solution (of course that would imply that private
> > locks held by the same process but by different file descriptors would
> > conflict appropriately).
> >
> 
> Good point. In the implementation I have so far, POSIX locks held by the
> same process don't conflict, just like normal POSIX locks do. For these
sorts
> of locks, I think it would make sense to have more flock()-like behavior
there,
> such that locks held by the same process on different file descriptors
will still
> conflict. I'll plan to make that change on the next pass.

Yea, and I think the "private" being part of the descriptor of these locks
will make those semantics make sense. These locks are private to the
particular file descriptor.

> > There is a resource issue though of how many file descriptors we have
> open.
> > Is there any practical limit on the number of file descriptors a
> > process has open? Can the kernel support 1000s of descriptors? How
> > much resource does a file descriptor take? Looks like a struct file
> > isn't tiny, not quite sure just how big it is.
> >
> > There is also some consideration of how this interacts with share
> > reservations (where is that proposal going BTW?). But I don't think
> > this really introduces anything new. We still have to guess the best
> > access mode to open a file descriptor that will be used for locks no
> > matter how we implement this.
> >
> 
> At least in the currently proposed patchset by Pavel, share reservations
are
> orthogonal to these since they're based on LOCK_MAND
> flock() locks.

Sure, and I don't think they'll cause an issue. Hmm, what about delegations?
An out of tree file system I used to be associated with had issues with
Ganesha's opening files read/write because of the POSIX behavior caused fits
with delegations.

> > So I guess my big concern is the resource impact of lots of file
> > descriptors.
> >
> 
> That's understandable. I'm not clear on how big an overhead there is on
> maintaining an open file descriptor. OTOH, people use flock() and such and
it
> has similar requirements.

Let's mull on that some more. Does anyone have a quick answer to the amount
of memory used by an open file descriptor, and the related question of how
many open file descriptors it's reasonable for one process to have?

> I guess my main concern is that while I'm interested in adding interfaces
that
> make it _easier_ to implement fileservers, I'm not terribly interested in
> adding interfaces that are _specific_ to implementing them.
> 
> Whatever interface we add needs to be generic enough to be useful to other
> applications as well. The changes you're suggesting sound rather specific
to a
> particular use-case.

Sure, understood, though the Samba folks may have some input here also
(though at least they have a process per connection still right? So the
trouble would be a client with lots of processes running on it, not lots of
clients running one (or a few) process each.

Frank

  reply	other threads:[~2013-10-11 18:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-11 12:25 [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 1/5] locks: consolidate checks for compatible filp->f_mode values in setlk handlers Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 2/5] locks: add definitions for F_RDLCKP and F_WRLCKP Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 3/5] locks: skip FL_FILP_PRIVATE locks on close unless we're closing the correct filp Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 4/5] locks: handle merging of locks when FL_FILP_PRIVATE is set Jeff Layton
2013-10-11 12:25 ` [RFC PATCH 5/5] locks: show private lock types in /proc/locks Jeff Layton
2013-10-11 13:35 ` [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks Jeff Layton
2013-10-11 15:20   ` Frank Filz
2013-10-11 15:50     ` Jeff Layton
2013-10-11 17:07       ` Frank Filz
2013-10-11 18:42         ` Jeff Layton
2013-10-11 18:53           ` Frank Filz [this message]
2013-10-12  9:10             ` Volker Lendecke
2013-10-11 20:07 ` J. Bruce Fields
2013-10-11 21:36   ` Andreas Dilger
2013-10-11 23:21     ` Jeff Layton
2013-10-11 23:49       ` Jeremy Allison
2013-10-12  0:18         ` Scott Lovenberg
2013-10-12  0:42           ` Jeff Layton
2013-10-12 18:12             ` Frank Filz
2013-10-14  7:24               ` Volker Lendecke
2013-10-14 15:23                 ` Frank Filz
2013-10-15  8:56                   ` Volker Lendecke
2013-10-12 20:56             ` Scott Lovenberg
2013-10-12  9:20 ` Stefan (metze) Metzmacher
2013-10-12 11:47   ` Jeff Layton
2013-10-12 18:10     ` [Nfs-ganesha-devel] " Frank Filz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='020301cec6b3$30aa5d00$91ff1700$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nfs-ganesha-devel@lists.sourceforge.net \
    --cc=samba-technical@lists.samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).