From: Jeff Layton <jeff.layton@primarydata.com>
To: Mike Marshall <hubcap@omnibond.com>
Cc: linux-fsdevel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Sasha Levin <sasha.levin@oracle.com>,
David Howells <dhowells@redhat.com>,
linux-kernel@vger.kernel.org, linux-cifs@vger.kernel.org,
linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [PATCH v3 00/13] locks: saner method for managing file locks
Date: Mon, 2 Feb 2015 15:42:43 -0500 [thread overview]
Message-ID: <20150202154243.7023391f@tlielax.poochiereds.net> (raw)
In-Reply-To: <CAOg9mSQSW68-aCh529BdJJ-G3xeWGY3NsNoizPu5wNdrZz88vA@mail.gmail.com>
On Mon, 2 Feb 2015 15:29:33 -0500
Mike Marshall <hubcap@omnibond.com> wrote:
> I applied Jeff's patch to my orangefs-3.19-rc5 tree. Orangefs
> doesn't yet support the lock callout for file_operations, but
> we have been experimenting with some ideas that would allow
> Orangefs to honor locks in our distributed environment: basically
> posix locks for each kernel client plus meta data in our userspace
> servers.
>
> Anyhow, the lock callout in the Orangefs patch I've posted
> just returns ENOSYS. I have added posix locks in a current
> test, so now I have:
>
> int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
> {
> int rc;
>
> rc = posix_lock_file(filp, fl, NULL);
>
> return rc;
> }
>
> So... while this isn't safe for our real world distributed environment,
> it makes the kernel with this version of the Orangefs kernel client
> loaded on it think that locks work.
>
> Besides observing that locks seem to work by running some programs
> that need locks (like svn/sqlite) I also ran xfstest generic/131.
>
> Without Jeff's patch, xfstest generic/131 fails:
>
> 29:Verify that F_GETLK for F_WRLCK doesn't require that file be
> opened for write
>
> Same with Jeff's patch.
>
> Jeff's patch applied painlessly, and my simple tests didn't uncover
> any problems with Jeff's implementation of separate lists for different
> lock types, so that's good.
>
> What surprised me, though, is that the posix lock code failed
> to get all the way through xfstest generic/131... maybe you
> all knew that already?
>
> -Mike
>
Hmm... I haven't seen 131 fail like this, but your code above looks
wrong. You're ignoring the "cmd" argument, so even F_GETLK requests are
setting a lock.
I think you might want to do something like:
int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
{
if (cmd == F_GETLK)
return posix_test_lock(filp, fl);
return posix_lock_file(filp, fl, fl);
}
...untested of course, but you get the idea.
> On Thu, Jan 22, 2015 at 9:27 AM, Jeff Layton
> <jeff.layton@primarydata.com> wrote:
> > v3:
> > - break out a ceph locking cleanup patch into a separate one earlier
> > in the series
> > - switch back to using the i_lock to protect assignment of i_flctx.
> > Using cmpxchg() was subject to races that I couldn't quite get a
> > grip on. Eventually I may switch it back, but it probably doesn't
> > provide much benefit.
> > - add a patch to clean up some comments that refer to i_flock
> > - use list_empty_careful in lockless checks rather than list_empty
> >
> > v2:
> > - bugfix to the flc_posix list ordering that broke the split/merge code
> > - don't use i_lock to manage the i_flctx pointer. Do so locklessly
> > via cmpxchg().
> > - reordering of the patches to make the set bisectable. As a result
> > the new spinlock is not introduced until near the end of the set
> > - some cleanup of the lm_change operation was added, made possible
> > by the move to standard list_heads for tracking locks
> > - it takes greater pains to avoid spinlocking by checking when the
> > lists are empty in addition to whether the i_flctx pointer is NULL
> > - a patch was added to keep count of the number of locks, so we can
> > avoid having to do count/alloc/populate in ceph and cifs
> >
> > This is the third iteration of this patchset. The second was posted
> > on January 8th, under the cover letter entitled:
> >
> > [PATCH v2 00/10] locks: saner method for managing file locks
> >
> > The big change here is that it once again uses the i_lock to protect the
> > i_flctx pointer assignment. In principle we should be able to use
> > cmpxchg instead, but that seems leave open a race that causes
> > locks_remove_file to miss recently-added locks. Given that is not a
> > super latency-sensitive codepath, an extra spinlock shouldn't make much
> > difference.
> >
> > Many thanks to Sasha Levin who helped me nail a race that would leave
> > leftover locks on the i_flock list, and David Howells who convinced
> > me to just go ahead back to using the i_lock to protect that field.
> >
> > There are some other minor changes, but overall it's pretty similar
> > to the last set. I still plan to merge this for v3.20.
> >
> > ------------------------[snip]-------------------------
> >
> > We currently manage all file_locks via a singly-linked list. This is
> > problematic for a number of reasons:
> >
> > - we have to protect all file locks with the same spinlock (or
> > equivalent). Currently that uses the i_lock, but Christoph has voiced
> > objections due to the potential for contention with other i_lock
> > users. He'd like to see us move to using a different lock.
> >
> > - we have to walk through irrelevant file locks in order to get to the
> > ones we're interested in. For instance, POSIX locks are at the end
> > of the list, so we have to skip over all of the flock locks and
> > leases before we can work with them.
> >
> > - the singly-linked list is primitive and difficult to work with. We
> > have to keep track of the "before" pointer and it's easy to get that
> > wrong.
> >
> > Cleaning all of this up is complicated by the fact that no one really
> > wants to grow struct inode in order to do so. We have a single pointer
> > in the inode now and I don't think we want to use any more.
> >
> > One possibility that Trond raised was to move this to an hlist, but
> > that doesn't do anything about the desire for a new spinlock.
> >
> > This patchset takes the approach of replacing the i_flock list with a
> > new struct file_lock_context that is allocated when we intend to add a
> > new file lock to an inode. The file_lock_context is only freed when we
> > destroy the inode.
> >
> > Within that, we have separate (and standard!) lists for each lock type,
> > and a dedicated spinlock for managing those lists. In principle we could
> > even consider adding separate locks for each, but I didn't bother with
> > that for now.
> >
> > For now, the code is still pretty "raw" and isn't bisectable. This is
> > just a RFC for the basic approach. This is probably v3.19 material at
> > best.
> >
> > Anyone have thoughts or comments on the basic approach?
> >
> > Jeff Layton (13):
> > locks: add new struct list_head to struct file_lock
> > locks: have locks_release_file use flock_lock_file to release generic
> > flock locks
> > locks: add a new struct file_locking_context pointer to struct inode
> > ceph: move spinlocking into ceph_encode_locks_to_buffer and
> > ceph_count_locks
> > locks: move flock locks to file_lock_context
> > locks: convert posix locks to file_lock_context
> > locks: convert lease handling to file_lock_context
> > locks: remove i_flock field from struct inode
> > locks: add a dedicated spinlock to protect i_flctx lists
> > locks: clean up the lm_change prototype
> > locks: keep a count of locks on the flctx lists
> > locks: consolidate NULL i_flctx checks in locks_remove_file
> > locks: update comments that refer to inode->i_flock
> >
> > fs/ceph/locks.c | 64 +++---
> > fs/ceph/mds_client.c | 4 -
> > fs/cifs/file.c | 34 +--
> > fs/inode.c | 3 +-
> > fs/lockd/svcsubs.c | 26 ++-
> > fs/locks.c | 569 +++++++++++++++++++++++++++------------------------
> > fs/nfs/delegation.c | 23 ++-
> > fs/nfs/nfs4state.c | 70 ++++---
> > fs/nfs/pagelist.c | 6 +-
> > fs/nfs/write.c | 41 +++-
> > fs/nfsd/nfs4state.c | 21 +-
> > fs/read_write.c | 2 +-
> > include/linux/fs.h | 52 +++--
> > 13 files changed, 510 insertions(+), 405 deletions(-)
> >
> > --
> > 2.1.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jeff Layton <jeff.layton@primarydata.com>
next prev parent reply other threads:[~2015-02-02 20:42 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-22 14:27 [PATCH v3 00/13] locks: saner method for managing file locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 01/13] locks: add new struct list_head to struct file_lock Jeff Layton
2015-01-22 14:27 ` [PATCH v3 02/13] locks: have locks_release_file use flock_lock_file to release generic flock locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 03/13] locks: add a new struct file_locking_context pointer to struct inode Jeff Layton
2015-01-22 14:27 ` [PATCH v3 04/13] ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 05/13] locks: move flock locks to file_lock_context Jeff Layton
2015-01-22 14:27 ` [PATCH v3 06/13] locks: convert posix " Jeff Layton
2015-01-22 14:27 ` [PATCH v3 07/13] locks: convert lease handling " Jeff Layton
2015-01-22 14:27 ` [PATCH v3 08/13] locks: remove i_flock field from struct inode Jeff Layton
2015-01-22 14:27 ` [PATCH v3 09/13] locks: add a dedicated spinlock to protect i_flctx lists Jeff Layton
2015-01-22 14:27 ` [PATCH v3 10/13] locks: clean up the lm_change prototype Jeff Layton
2015-01-22 14:27 ` [PATCH v3 11/13] locks: keep a count of locks on the flctx lists Jeff Layton
2015-01-22 14:27 ` [PATCH v3 12/13] locks: consolidate NULL i_flctx checks in locks_remove_file Jeff Layton
2015-01-22 14:27 ` [PATCH v3 13/13] locks: update comments that refer to inode->i_flock Jeff Layton
2015-02-02 20:29 ` [PATCH v3 00/13] locks: saner method for managing file locks Mike Marshall
2015-02-02 20:42 ` Jeff Layton [this message]
2015-02-03 18:01 ` Mike Marshall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150202154243.7023391f@tlielax.poochiereds.net \
--to=jeff.layton@primarydata.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=hubcap@omnibond.com \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox