linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Xiubo Li <xiubli@redhat.com>, chuck.lever@oracle.com
Cc: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC PATCH] filelock: new helper: vfs_file_has_locks
Date: Tue, 15 Nov 2022 09:40:04 -0500	[thread overview]
Message-ID: <4a8720c8a24a9b06adc40fdada9c621fd5d849df.camel@kernel.org> (raw)
In-Reply-To: <54b90281-c575-5aee-e886-e4d7b50236f0@redhat.com>

On Tue, 2022-11-15 at 13:43 +0800, Xiubo Li wrote:
> On 15/11/2022 03:46, Jeff Layton wrote:
> > On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote:
> > > Ceph has a need to know whether a particular file has any locks set on
> > > it. It's currently tracking that by a num_locks field in its
> > > filp->private_data, but that's problematic as it tries to decrement this
> > > field when releasing locks and that can race with the file being torn
> > > down.
> > > 
> > > Add a new vfs_file_has_locks helper that will scan the flock and posix
> > > lists, and return true if any of the locks have a fl_file that matches
> > > the given one. Ceph can then call this instead of doing its own
> > > tracking.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >   fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
> > >   include/linux/fs.h |  1 +
> > >   2 files changed, 37 insertions(+)
> > > 
> > > Xiubo,
> > > 
> > > Here's what I was thinking instead of trying to track this within ceph.
> > > Most inodes never have locks set, so in most cases this will be a NULL
> > > pointer check.
> > > 
> > > 
> > > 
> > I went ahead and added a slightly updated version of this this to my
> > locks-next branch for now, but...
> > 
> > Thinking about this more...I'm not sure this whole concept of what the
> > ceph code is trying to do makes sense. Locks only conflict if they have
> > different owners, and POSIX locks are owned by the process. Consider
> > this scenario (obviously, this is not a problem with OFD locks).
> > 
> > A process has the same file open via two different fds. It sets lock A
> > from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via
> > fd 2. The two locks will be merged, because they don't conflict (because
> > it's the same process).
> > 
> > Against which fd should the merged lock record be counted?
> 
> Thanks Jeff.
> 
> For the above example as you mentioned, from my reading of the lock code 
> after being merged it will always keep the old file_lock's fl_file.
> 
> There is another case that if the Inode already has LockA and LockB:
> 
> Lock A --> [0, 9] --> fileA
> 
> Lock B --> [15, 20] --> fileB
> 
> And then LockC comes:
> 
> Lock C --> [8, 16] --> fileC
> 
> Then the inode will only have the LockB:
> 
> Lock B --> [0, 20] --> fileB.
> 
> So the exiting ceph code seems buggy!
> 

Yeah, there are a number of ways to end up with a different fl_file than
you started with.
 
> > 
> > Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when
> > the fd hasn't had any locks explicitly set on it?
> 
> Maybe we should check whether any POSIX lock exist, if so we should 
> check CEPH_I_ERROR_FILELOCK always. Or we need to check it depending on 
> each fd ?
> 
> 

It was originally added here:

commit ff5d913dfc7142974eb1694d5fd6284658e46bc6
Author: Yan, Zheng <zyan@redhat.com>
Date:   Thu Jul 25 20:16:45 2019 +0800

    ceph: return -EIO if read/write against filp that lost file locks
    
    After mds evicts session, file locks get lost sliently. It's not safe to
    let programs continue to do read/write.
    
    Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

So I guess with the current code if you have the file open and set a
lock on it, you'll get back EIO when you try to get caps for it, but if
you never set a lock on the fd, then you wouldn't get an error. We don't
reliably keep track of what fd was used to set a lock (as noted above),
so we can't really do what Zheng was trying to do here.

Having a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they only
work if everyone pays attention to them.

I think we should probably switch ceph_get_caps to just check whether
any locks are set on the file. If there are POSIX/OFD/FLOCK locks on the
file at the time, we should set CHECK_FILELOCK, regardless of what fd
was used to set the lock.

In practical terms, we probably want a vfs_inode_has_locks function,
that just tests whether the flc_posix and flc_flock lists are empty.

Maybe something like this instead? Then ceph could call this from
ceph_get_caps and set CHECK_FILELOCK if it returns true.

-------------8<---------------

[PATCH] filelock: new helper: vfs_inode_has_locks

Ceph has a need to know whether a particular inode has any locks set on
it. It's currently tracking that by a num_locks field in its
filp->private_data, but that's problematic as it tries to decrement this
field when releasing locks and that can race with the file being torn
down.

Add a new vfs_inode_has_locks helper that just returns whether any locks
are currently held on the inode.

Cc: Xiubo Li <xiubli@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/locks.c         | 23 +++++++++++++++++++++++
 include/linux/fs.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 5876c8ff0edc..9ccf89b6c95d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2672,6 +2672,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
 }
 EXPORT_SYMBOL_GPL(vfs_cancel_lock);
 
+/**
+ * vfs_inode_has_locks - are any file locks held on @inode?
+ * @inode: inode to check for locks
+ *
+ * Return true if there are any FL_POSIX or FL_FLOCK locks currently
+ * set on @inode.
+ */
+bool vfs_inode_has_locks(struct inode *inode)
+{
+	struct file_lock_context *ctx;
+	bool ret;
+
+	ctx = smp_load_acquire(&inode->i_flctx);
+	if (!ctx)
+		return false;
+
+	spin_lock(&ctx->flc_lock);
+	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
+	spin_unlock(&ctx->flc_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
+
 #ifdef CONFIG_PROC_FS
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e654435f1651..d6cb42b7e91c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
 extern int vfs_test_lock(struct file *, struct file_lock *);
 extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
+bool vfs_inode_has_locks(struct inode *inode);
 extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
 extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
 extern void lease_get_mtime(struct inode *, struct timespec64 *time);
-- 
2.38.1



  reply	other threads:[~2022-11-15 14:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-14 14:07 [RFC PATCH] filelock: new helper: vfs_file_has_locks Jeff Layton
2022-11-14 14:19 ` Chuck Lever III
2022-11-14 19:46 ` Jeff Layton
2022-11-15  5:43   ` Xiubo Li
2022-11-15 14:40     ` Jeff Layton [this message]
2022-11-16  6:16       ` Christoph Hellwig
2022-11-16  6:49       ` Xiubo Li
2022-11-16 10:55         ` Jeff Layton
2022-11-16 11:16           ` Xiubo Li
2022-11-16 11:25             ` Jeff Layton
2022-11-16 13:24               ` Xiubo Li
2022-11-15  8:54 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a8720c8a24a9b06adc40fdada9c621fd5d849df.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).