Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jeff.layton@primarydata.com>
To: Mike Marshall <hubcap@omnibond.com>
Cc: linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	David Howells <dhowells@redhat.com>,
	linux-kernel@vger.kernel.org, linux-cifs@vger.kernel.org,
	linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [PATCH v3 00/13] locks: saner method for managing file locks
Date: Mon, 2 Feb 2015 15:42:43 -0500	[thread overview]
Message-ID: <20150202154243.7023391f@tlielax.poochiereds.net> (raw)
In-Reply-To: <CAOg9mSQSW68-aCh529BdJJ-G3xeWGY3NsNoizPu5wNdrZz88vA@mail.gmail.com>

On Mon, 2 Feb 2015 15:29:33 -0500
Mike Marshall <hubcap@omnibond.com> wrote:

> I applied Jeff's patch to my orangefs-3.19-rc5 tree. Orangefs
> doesn't yet support the lock callout for file_operations, but
> we have been experimenting with some ideas that would allow
> Orangefs to honor locks in our distributed environment: basically
> posix locks for each kernel client plus meta data in our userspace
> servers.
> 
> Anyhow, the lock callout in the Orangefs patch I've posted
> just returns ENOSYS. I have added posix locks in a current
> test, so now I have:
> 
> int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
> {
>         int rc;
> 
>         rc = posix_lock_file(filp, fl, NULL);
> 
>         return rc;
> }
> 
> So... while this isn't safe for our real world distributed environment,
> it makes the kernel with this version of the Orangefs kernel client
> loaded on it think that locks work.
> 
> Besides observing that locks seem to work by running some programs
> that need locks (like svn/sqlite) I also ran xfstest generic/131.
> 
> Without Jeff's patch, xfstest generic/131 fails:
> 
>   29:Verify that F_GETLK for F_WRLCK doesn't require that file be
>      opened for write
> 
> Same with Jeff's patch.
> 
> Jeff's patch applied painlessly, and my simple tests didn't uncover
> any problems with Jeff's implementation of separate lists for different
> lock types, so that's good.
> 
> What surprised me, though, is that the posix lock code failed
> to get all the way through xfstest generic/131... maybe you
> all knew that already?
> 
> -Mike
> 

Hmm... I haven't seen 131 fail like this, but your code above looks
wrong. You're ignoring the "cmd" argument, so even F_GETLK requests are
setting a lock.

I think you might want to do something like:

int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
{
	if (cmd == F_GETLK)
		return posix_test_lock(filp, fl);
	return posix_lock_file(filp, fl, fl);
}

...untested of course, but you get the idea.


> On Thu, Jan 22, 2015 at 9:27 AM, Jeff Layton
> <jeff.layton@primarydata.com> wrote:
> > v3:
> > - break out a ceph locking cleanup patch into a separate one earlier
> >   in the series
> > - switch back to using the i_lock to protect assignment of i_flctx.
> >   Using cmpxchg() was subject to races that I couldn't quite get a
> >   grip on. Eventually I may switch it back, but it probably doesn't
> >   provide much benefit.
> > - add a patch to clean up some comments that refer to i_flock
> > - use list_empty_careful in lockless checks rather than list_empty
> >
> > v2:
> > - bugfix to the flc_posix list ordering that broke the split/merge code
> > - don't use i_lock to manage the i_flctx pointer. Do so locklessly
> >   via cmpxchg().
> > - reordering of the patches to make the set bisectable. As a result
> >   the new spinlock is not introduced until near the end of the set
> > - some cleanup of the lm_change operation was added, made possible
> >   by the move to standard list_heads for tracking locks
> > - it takes greater pains to avoid spinlocking by checking when the
> >   lists are empty in addition to whether the i_flctx pointer is NULL
> > - a patch was added to keep count of the number of locks, so we can
> >   avoid having to do count/alloc/populate in ceph and cifs
> >
> > This is the third iteration of this patchset. The second was posted
> > on January 8th, under the cover letter entitled:
> >
> >     [PATCH v2 00/10] locks: saner method for managing file locks
> >
> > The big change here is that it once again uses the i_lock to protect the
> > i_flctx pointer assignment. In principle we should be able to use
> > cmpxchg instead, but that seems leave open a race that causes
> > locks_remove_file to miss recently-added locks. Given that is not a
> > super latency-sensitive codepath, an extra spinlock shouldn't make much
> > difference.
> >
> > Many thanks to Sasha Levin who helped me nail a race that would leave
> > leftover locks on the i_flock list, and David Howells who convinced
> > me to just go ahead back to using the i_lock to protect that field.
> >
> > There are some other minor changes, but overall it's pretty similar
> > to the last set. I still plan to merge this for v3.20.
> >
> > ------------------------[snip]-------------------------
> >
> > We currently manage all file_locks via a singly-linked list. This is
> > problematic for a number of reasons:
> >
> > - we have to protect all file locks with the same spinlock (or
> >   equivalent). Currently that uses the i_lock, but Christoph has voiced
> >   objections due to the potential for contention with other i_lock
> >   users. He'd like to see us move to using a different lock.
> >
> > - we have to walk through irrelevant file locks in order to get to the
> >   ones we're interested in. For instance, POSIX locks are at the end
> >   of the list, so we have to skip over all of the flock locks and
> >   leases before we can work with them.
> >
> > - the singly-linked list is primitive and difficult to work with. We
> >   have to keep track of the "before" pointer and it's easy to get that
> >   wrong.
> >
> > Cleaning all of this up is complicated by the fact that no one really
> > wants to grow struct inode in order to do so. We have a single pointer
> > in the inode now and I don't think we want to use any more.
> >
> > One possibility that Trond raised was to move this to an hlist, but
> > that doesn't do anything about the desire for a new spinlock.
> >
> > This patchset takes the approach of replacing the i_flock list with a
> > new struct file_lock_context that is allocated when we intend to add a
> > new file lock to an inode. The file_lock_context is only freed when we
> > destroy the inode.
> >
> > Within that, we have separate (and standard!) lists for each lock type,
> > and a dedicated spinlock for managing those lists. In principle we could
> > even consider adding separate locks for each, but I didn't bother with
> > that for now.
> >
> > For now, the code is still pretty "raw" and isn't bisectable. This is
> > just a RFC for the basic approach. This is probably v3.19 material at
> > best.
> >
> > Anyone have thoughts or comments on the basic approach?
> >
> > Jeff Layton (13):
> >   locks: add new struct list_head to struct file_lock
> >   locks: have locks_release_file use flock_lock_file to release generic
> >     flock locks
> >   locks: add a new struct file_locking_context pointer to struct inode
> >   ceph: move spinlocking into ceph_encode_locks_to_buffer and
> >     ceph_count_locks
> >   locks: move flock locks to file_lock_context
> >   locks: convert posix locks to file_lock_context
> >   locks: convert lease handling to file_lock_context
> >   locks: remove i_flock field from struct inode
> >   locks: add a dedicated spinlock to protect i_flctx lists
> >   locks: clean up the lm_change prototype
> >   locks: keep a count of locks on the flctx lists
> >   locks: consolidate NULL i_flctx checks in locks_remove_file
> >   locks: update comments that refer to inode->i_flock
> >
> >  fs/ceph/locks.c      |  64 +++---
> >  fs/ceph/mds_client.c |   4 -
> >  fs/cifs/file.c       |  34 +--
> >  fs/inode.c           |   3 +-
> >  fs/lockd/svcsubs.c   |  26 ++-
> >  fs/locks.c           | 569 +++++++++++++++++++++++++++------------------------
> >  fs/nfs/delegation.c  |  23 ++-
> >  fs/nfs/nfs4state.c   |  70 ++++---
> >  fs/nfs/pagelist.c    |   6 +-
> >  fs/nfs/write.c       |  41 +++-
> >  fs/nfsd/nfs4state.c  |  21 +-
> >  fs/read_write.c      |   2 +-
> >  include/linux/fs.h   |  52 +++--
> >  13 files changed, 510 insertions(+), 405 deletions(-)
> >
> > --
> > 2.1.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Layton <jeff.layton@primarydata.com>

  reply	other threads:[~2015-02-02 20:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-22 14:27 [PATCH v3 00/13] locks: saner method for managing file locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 01/13] locks: add new struct list_head to struct file_lock Jeff Layton
2015-01-22 14:27 ` [PATCH v3 02/13] locks: have locks_release_file use flock_lock_file to release generic flock locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 03/13] locks: add a new struct file_locking_context pointer to struct inode Jeff Layton
2015-01-22 14:27 ` [PATCH v3 04/13] ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 05/13] locks: move flock locks to file_lock_context Jeff Layton
2015-01-22 14:27 ` [PATCH v3 06/13] locks: convert posix " Jeff Layton
2015-01-22 14:27 ` [PATCH v3 07/13] locks: convert lease handling " Jeff Layton
2015-01-22 14:27 ` [PATCH v3 08/13] locks: remove i_flock field from struct inode Jeff Layton
2015-01-22 14:27 ` [PATCH v3 09/13] locks: add a dedicated spinlock to protect i_flctx lists Jeff Layton
2015-01-22 14:27 ` [PATCH v3 10/13] locks: clean up the lm_change prototype Jeff Layton
2015-01-22 14:27 ` [PATCH v3 11/13] locks: keep a count of locks on the flctx lists Jeff Layton
2015-01-22 14:27 ` [PATCH v3 12/13] locks: consolidate NULL i_flctx checks in locks_remove_file Jeff Layton
2015-01-22 14:27 ` [PATCH v3 13/13] locks: update comments that refer to inode->i_flock Jeff Layton
2015-02-02 20:29 ` [PATCH v3 00/13] locks: saner method for managing file locks Mike Marshall
2015-02-02 20:42   ` Jeff Layton [this message]
2015-02-03 18:01     ` Mike Marshall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150202154243.7023391f@tlielax.poochiereds.net \
    --to=jeff.layton@primarydata.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=hubcap@omnibond.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox