[PATCH 0/2] Use exclusive lock for file_remove

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Use exclusive lock for file_remove_privs
@ 2023-08-30 18:15 Bernd Schubert
  2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Bernd Schubert @ 2023-08-30 18:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Josef Bacik,
	linux-btrfs, Alexander Viro, Christian Brauner

While adding shared direct IO write locks to fuse Miklos noticed
that file_remove_privs() needs an exclusive lock. I then
noticed that btrfs actually has the same issue as I had in my patch,
it was calling into that function with a shared lock.
This series adds a new exported function file_needs_remove_privs(),
which used by the follow up btrfs patch and will be used by the
DIO code path in fuse as well. If that function returns any mask
the shared lock needs to be dropped and replaced by the exclusive
variant.

Note: Compilation tested only.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org

Bernd Schubert (2):
  fs: Add and export file_needs_remove_privs
  btrfs: file_remove_privs needs an exclusive lock

 fs/btrfs/file.c    | 37 +++++++++++++++++++++++++++++--------
 fs/inode.c         |  8 ++++++++
 include/linux/fs.h |  1 +
 3 files changed, 38 insertions(+), 8 deletions(-)

-- 
2.39.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] fs: Add and export file_needs_remove_privs
  2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert
@ 2023-08-30 18:15 ` Bernd Schubert
  2023-09-01  6:48   ` Christoph Hellwig
  2023-08-30 18:15 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Bernd Schubert @ 2023-08-30 18:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Josef Bacik,
	linux-btrfs, Alexander Viro, Christian Brauner

File systems want to hold a shared lock for DIO writes,
but may need to drop file priveliges - that a requires an
exclusive lock. The new export function file_needs_remove_privs()
is added in order to first check if that is needed.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/inode.c         | 8 ++++++++
 include/linux/fs.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index 67611a360031..9b05db602e41 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2013,6 +2013,14 @@ int dentry_needs_remove_privs(struct mnt_idmap *idmap,
 	return mask;
 }
 
+int file_needs_remove_privs(struct file *file)
+{
+	struct dentry *dentry = file_dentry(file);
+
+	return dentry_needs_remove_privs(file_mnt_idmap(file), dentry);
+}
+EXPORT_SYMBOL_GPL(file_needs_remove_privs);
+
 static int __remove_privs(struct mnt_idmap *idmap,
 			  struct dentry *dentry, int kill)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 562f2623c9c9..9245f0de00bc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2721,6 +2721,7 @@ extern struct inode *new_inode_pseudo(struct super_block *sb);
 extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int setattr_should_drop_suidgid(struct mnt_idmap *, struct inode *);
+int file_needs_remove_privs(struct file *);
 extern int file_remove_privs(struct file *);
 int setattr_should_drop_sgid(struct mnt_idmap *idmap,
 			     const struct inode *inode);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] fs: Add and export file_needs_remove_privs
  2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert
@ 2023-09-01  6:48   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-09-01  6:48 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik,
	linux-btrfs, Alexander Viro, Christian Brauner

On Wed, Aug 30, 2023 at 08:15:18PM +0200, Bernd Schubert wrote:
> File systems want to hold a shared lock for DIO writes,
> but may need to drop file priveliges - that a requires an
> exclusive lock. The new export function file_needs_remove_privs()
> is added in order to first check if that is needed.

As said last time - the existing file systems with shared locking for
direct I/O just do the much more pessimistic IS_SEC check here.  I'd
suggest to just do that for btrfs first step.  If we then have numbers
justifying the finer grained check we should update veryone and not
just do it for one place.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock
  2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert
  2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert
@ 2023-08-30 18:15 ` Bernd Schubert
  2023-08-31  7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig
  2023-08-31 10:18 ` Mateusz Guzik
  3 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2023-08-30 18:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Goldwyn Rodrigues,
	Chris Mason, Josef Bacik, David Sterba, linux-btrfs

file_remove_privs might call into notify_change(), which
requires to hold an exclusive lock.
In order to keep the shared lock for most IOs it now first
checks if privilege changes are needed, then switches to
the exclusive lock, rechecks and only then calls file_remove_privs.
This makes usage of the new exported function
file_needs_remove_privs().

The file_remove_privs code path is not optimized, under the
assumption that it would be a rare call (file_remove_privs
calls file_needs_remove_privs a 2nd time).

Fixes: e9adabb9712e ("btrfs: use shared lock for direct writes within EOF")
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
Cc: linux-btrfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/btrfs/file.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fd03e689a6be..89c869ab131d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1125,7 +1125,7 @@ static void update_time_for_write(struct inode *inode)
 }
 
 static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
-			     size_t count)
+			     size_t count, unsigned int *ilock_flags)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
@@ -1145,9 +1145,17 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
 	    !(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
 		return -EAGAIN;
 
-	ret = file_remove_privs(file);
-	if (ret)
-		return ret;
+	ret = file_needs_remove_privs(file);
+	if (ret) {
+		if (ilock_flags && *ilock_flags & BTRFS_ILOCK_SHARED) {
+			*ilock_flags &= ~BTRFS_ILOCK_SHARED;
+			return -EAGAIN;
+		}
+
+		ret = file_remove_privs(file);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * We reserve space for updating the inode when we reserve space for the
@@ -1204,7 +1212,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	if (ret <= 0)
 		goto out;
 
-	ret = btrfs_write_check(iocb, i, ret);
+	ret = btrfs_write_check(iocb, i, ret, NULL);
 	if (ret < 0)
 		goto out;
 
@@ -1462,13 +1470,16 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	ssize_t err;
 	unsigned int ilock_flags = 0;
 	struct iomap_dio *dio;
+	bool has_shared_lock;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		ilock_flags |= BTRFS_ILOCK_TRY;
 
 	/* If the write DIO is within EOF, use a shared lock */
-	if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode))
+	if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode)) {
 		ilock_flags |= BTRFS_ILOCK_SHARED;
+		has_shared_lock = true;
+	}
 
 relock:
 	err = btrfs_inode_lock(BTRFS_I(inode), ilock_flags);
@@ -1481,8 +1492,17 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 		return err;
 	}
 
-	err = btrfs_write_check(iocb, from, err);
+	/* might uset BTRFS_ILOCK_SHARED */
+	err = btrfs_write_check(iocb, from, err, &ilock_flags);
 	if (err < 0) {
+		if (err == -EAGAIN && has_shared_lock &&
+		    !(ilock_flags & BTRFS_ILOCK_SHARED)) {
+			btrfs_inode_unlock(BTRFS_I(inode),
+					   ilock_flags | BTRFS_ILOCK_SHARED);
+			has_shared_lock = false;
+			goto relock;
+		}
+
 		btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
 		goto out;
 	}
@@ -1496,6 +1516,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	    pos + iov_iter_count(from) > i_size_read(inode)) {
 		btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
 		ilock_flags &= ~BTRFS_ILOCK_SHARED;
+		has_shared_lock = false;
 		goto relock;
 	}
 
@@ -1632,7 +1653,7 @@ static ssize_t btrfs_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	if (ret || encoded->len == 0)
 		goto out;
 
-	ret = btrfs_write_check(iocb, from, encoded->len);
+	ret = btrfs_write_check(iocb, from, encoded->len, NULL);
 	if (ret < 0)
 		goto out;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert
  2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert
  2023-08-30 18:15 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert
@ 2023-08-31  7:40 ` Christoph Hellwig
  2023-08-31 10:17   ` Bernd Schubert
  2023-08-31 10:18 ` Mateusz Guzik
  3 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-08-31  7:40 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik,
	linux-btrfs, Alexander Viro, Christian Brauner

On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
> While adding shared direct IO write locks to fuse Miklos noticed
> that file_remove_privs() needs an exclusive lock. I then
> noticed that btrfs actually has the same issue as I had in my patch,
> it was calling into that function with a shared lock.
> This series adds a new exported function file_needs_remove_privs(),
> which used by the follow up btrfs patch and will be used by the
> DIO code path in fuse as well. If that function returns any mask
> the shared lock needs to be dropped and replaced by the exclusive
> variant.

FYI, xfs and ext4 use a simple IS_NOSEC check for this.  That has
a lot more false positives, but also is a much faster check in the
fast path.   It might be worth benchmarking which way to go, but
we should be consistent across file systems for the behavior.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-08-31  7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig
@ 2023-08-31 10:17   ` Bernd Schubert
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2023-08-31 10:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel@vger.kernel.org, bernd.schubert@fastmail.fm,
	miklos@szeredi.hu, Dharmendra Singh, Josef Bacik,
	linux-btrfs@vger.kernel.org, Alexander Viro, Christian Brauner

On 8/31/23 09:40, Christoph Hellwig wrote:
> On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
>> While adding shared direct IO write locks to fuse Miklos noticed
>> that file_remove_privs() needs an exclusive lock. I then
>> noticed that btrfs actually has the same issue as I had in my patch,
>> it was calling into that function with a shared lock.
>> This series adds a new exported function file_needs_remove_privs(),
>> which used by the follow up btrfs patch and will be used by the
>> DIO code path in fuse as well. If that function returns any mask
>> the shared lock needs to be dropped and replaced by the exclusive
>> variant.
> 
> FYI, xfs and ext4 use a simple IS_NOSEC check for this.  That has
> a lot more false positives, but also is a much faster check in the
> fast path.   It might be worth benchmarking which way to go, but
> we should be consistent across file systems for the behavior.
> 

Thanks, interesting! It is basically the same as my 
file_needs_remove_privs, as long as IS_NOSEC is set. I can remove the 
new function and export and to change to that check.
Not sure if it is that much faster, but should keep the kernel a bit 
smaller.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert
                   ` (2 preceding siblings ...)
  2023-08-31  7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig
@ 2023-08-31 10:18 ` Mateusz Guzik
  2023-08-31 14:41   ` Bernd Schubert
                     ` (2 more replies)
  3 siblings, 3 replies; 13+ messages in thread
From: Mateusz Guzik @ 2023-08-31 10:18 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik,
	linux-btrfs, Alexander Viro, Christian Brauner

On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
> While adding shared direct IO write locks to fuse Miklos noticed
> that file_remove_privs() needs an exclusive lock. I then
> noticed that btrfs actually has the same issue as I had in my patch,
> it was calling into that function with a shared lock.
> This series adds a new exported function file_needs_remove_privs(),
> which used by the follow up btrfs patch and will be used by the
> DIO code path in fuse as well. If that function returns any mask
> the shared lock needs to be dropped and replaced by the exclusive
> variant.
> 

No comments on the patchset itself.

So I figured an assert should be there on the write lock held, then the
issue would have been automagically reported.

Turns out notify_change has the following:
        WARN_ON_ONCE(!inode_is_locked(inode));

Which expands to:
static inline int rwsem_is_locked(struct rw_semaphore *sem)
{
        return atomic_long_read(&sem->count) != 0;
}

So it does check the lock, except it passes *any* locked state,
including just readers.

According to git blame this regressed from commit 5955102c9984
("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
were replaced with inode_is_locked, which unintentionally provides
weaker guarantees.

I don't see a rwsem helper for wlock check and I don't think it is all
that beneficial to add. Instead, how about a bunch of lockdep, like so:
diff --git a/fs/attr.c b/fs/attr.c
index a8ae5f6d9b16..f47e718766d1 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
        struct timespec64 now;
        unsigned int ia_valid = attr->ia_valid;

-       WARN_ON_ONCE(!inode_is_locked(inode));
+       lockdep_assert_held_write(&inode->i_rwsem);

        error = may_setattr(idmap, inode, ia_valid);
        if (error)

Alternatively hide it behind inode_assert_is_wlocked() or whatever other
name.

I can do the churn if this sounds like a plan.


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-08-31 10:18 ` Mateusz Guzik
@ 2023-08-31 14:41   ` Bernd Schubert
  2023-09-01  6:49   ` Christoph Hellwig
  2023-09-01 11:38   ` Matthew Wilcox
  2 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2023-08-31 14:41 UTC (permalink / raw)
  To: Mateusz Guzik, Bernd Schubert
  Cc: linux-fsdevel, miklos, dsingh, Josef Bacik, linux-btrfs,
	Alexander Viro, Christian Brauner



On 8/31/23 12:18, Mateusz Guzik wrote:
> On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
>> While adding shared direct IO write locks to fuse Miklos noticed
>> that file_remove_privs() needs an exclusive lock. I then
>> noticed that btrfs actually has the same issue as I had in my patch,
>> it was calling into that function with a shared lock.
>> This series adds a new exported function file_needs_remove_privs(),
>> which used by the follow up btrfs patch and will be used by the
>> DIO code path in fuse as well. If that function returns any mask
>> the shared lock needs to be dropped and replaced by the exclusive
>> variant.
>>
> 
> No comments on the patchset itself.
> 
> So I figured an assert should be there on the write lock held, then the
> issue would have been automagically reported.
> 
> Turns out notify_change has the following:
>          WARN_ON_ONCE(!inode_is_locked(inode));
> 
> Which expands to:
> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> {
>          return atomic_long_read(&sem->count) != 0;
> }
> 
> So it does check the lock, except it passes *any* locked state,
> including just readers.
> 
> According to git blame this regressed from commit 5955102c9984
> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> were replaced with inode_is_locked, which unintentionally provides
> weaker guarantees.
> 
> I don't see a rwsem helper for wlock check and I don't think it is all
> that beneficial to add. Instead, how about a bunch of lockdep, like so:
> diff --git a/fs/attr.c b/fs/attr.c
> index a8ae5f6d9b16..f47e718766d1 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
>          struct timespec64 now;
>          unsigned int ia_valid = attr->ia_valid;
> 
> -       WARN_ON_ONCE(!inode_is_locked(inode));
> +       lockdep_assert_held_write(&inode->i_rwsem);
> 
>          error = may_setattr(idmap, inode, ia_valid);
>          if (error)
> 
> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
> name.
> 
> I can do the churn if this sounds like a plan.
> 

I guess that might help to discover these issues. Maybe keep the 
existing WARN_ON_ONCE, as it would annotate a missing lock, when lockdep 
is turned off?

Another code path is file_modified() and I just wonder if there are more 
issues.

btrfs_punch_hole() takes inode->i_mmap_lock. Although I guess that 
should be found by the existing WARN_ON_ONCE? Actually same in 
btrfs_fallocate(). Or does btrfs always have IS_NOSEC?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-08-31 10:18 ` Mateusz Guzik
  2023-08-31 14:41   ` Bernd Schubert
@ 2023-09-01  6:49   ` Christoph Hellwig
  2023-09-01 11:38   ` Matthew Wilcox
  2 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-09-01  6:49 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh,
	Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner

On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
> Turns out notify_change has the following:
>         WARN_ON_ONCE(!inode_is_locked(inode));
> 
> Which expands to:
> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> {
>         return atomic_long_read(&sem->count) != 0;
> }
> 
> So it does check the lock, except it passes *any* locked state,
> including just readers.
> 
> According to git blame this regressed from commit 5955102c9984
> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> were replaced with inode_is_locked, which unintentionally provides
> weaker guarantees.
> 
> I don't see a rwsem helper for wlock check and I don't think it is all
> that beneficial to add. Instead, how about a bunch of lockdep, like so:

Yes, that's a good idea.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-08-31 10:18 ` Mateusz Guzik
  2023-08-31 14:41   ` Bernd Schubert
  2023-09-01  6:49   ` Christoph Hellwig
@ 2023-09-01 11:38   ` Matthew Wilcox
  2023-09-01 11:47     ` Mateusz Guzik
  2 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2023-09-01 11:38 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh,
	Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner

On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
> So I figured an assert should be there on the write lock held, then the
> issue would have been automagically reported.
> 
> Turns out notify_change has the following:
>         WARN_ON_ONCE(!inode_is_locked(inode));
> 
> Which expands to:
> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> {
>         return atomic_long_read(&sem->count) != 0;
> }
> 
> So it does check the lock, except it passes *any* locked state,
> including just readers.
> 
> According to git blame this regressed from commit 5955102c9984
> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> were replaced with inode_is_locked, which unintentionally provides
> weaker guarantees.
> 
> I don't see a rwsem helper for wlock check and I don't think it is all
> that beneficial to add. Instead, how about a bunch of lockdep, like so:
> diff --git a/fs/attr.c b/fs/attr.c
> index a8ae5f6d9b16..f47e718766d1 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
>         struct timespec64 now;
>         unsigned int ia_valid = attr->ia_valid;
> 
> -       WARN_ON_ONCE(!inode_is_locked(inode));
> +       lockdep_assert_held_write(&inode->i_rwsem);
> 
>         error = may_setattr(idmap, inode, ia_valid);
>         if (error)
> 
> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
> name.

Better to do it like mmap_lock:

static inline void mmap_assert_write_locked(struct mm_struct *mm)
{
        lockdep_assert_held_write(&mm->mmap_lock);
        VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
}


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-09-01 11:38   ` Matthew Wilcox
@ 2023-09-01 11:47     ` Mateusz Guzik
  2023-09-06 15:13       ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Mateusz Guzik @ 2023-09-01 11:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh,
	Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner

On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote:
> On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
>> So I figured an assert should be there on the write lock held, then the
>> issue would have been automagically reported.
>>
>> Turns out notify_change has the following:
>>         WARN_ON_ONCE(!inode_is_locked(inode));
>>
>> Which expands to:
>> static inline int rwsem_is_locked(struct rw_semaphore *sem)
>> {
>>         return atomic_long_read(&sem->count) != 0;
>> }
>>
>> So it does check the lock, except it passes *any* locked state,
>> including just readers.
>>
>> According to git blame this regressed from commit 5955102c9984
>> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
>> were replaced with inode_is_locked, which unintentionally provides
>> weaker guarantees.
>>
>> I don't see a rwsem helper for wlock check and I don't think it is all
>> that beneficial to add. Instead, how about a bunch of lockdep, like so:
>> diff --git a/fs/attr.c b/fs/attr.c
>> index a8ae5f6d9b16..f47e718766d1 100644
>> --- a/fs/attr.c
>> +++ b/fs/attr.c
>> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct
>> dentry *dentry,
>>         struct timespec64 now;
>>         unsigned int ia_valid = attr->ia_valid;
>>
>> -       WARN_ON_ONCE(!inode_is_locked(inode));
>> +       lockdep_assert_held_write(&inode->i_rwsem);
>>
>>         error = may_setattr(idmap, inode, ia_valid);
>>         if (error)
>>
>> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
>> name.
>
> Better to do it like mmap_lock:
>
> static inline void mmap_assert_write_locked(struct mm_struct *mm)
> {
>         lockdep_assert_held_write(&mm->mmap_lock);
>         VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> }
>

May I suggest continuing this with responses to the patch I sent? ;)
[RFC PATCH] vfs: add inode lockdep assertions on -fsdevel

I made it line up with asserts for i_mmap_rwsem.

btw your non-lockdep check suffers the very problem I'm trying to fix
here -- checking for *any* locked state.

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs
  2023-09-01 11:47     ` Mateusz Guzik
@ 2023-09-06 15:13       ` Matthew Wilcox
  0 siblings, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2023-09-06 15:13 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh,
	Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner

On Fri, Sep 01, 2023 at 01:47:10PM +0200, Mateusz Guzik wrote:
> On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote:
> > On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
> >> So I figured an assert should be there on the write lock held, then the
> >> issue would have been automagically reported.
> >>
> >> Turns out notify_change has the following:
> >>         WARN_ON_ONCE(!inode_is_locked(inode));
> >>
> >> Which expands to:
> >> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> >> {
> >>         return atomic_long_read(&sem->count) != 0;
> >> }
> >>
> >> So it does check the lock, except it passes *any* locked state,
> >> including just readers.
> >>
> >> According to git blame this regressed from commit 5955102c9984
> >> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> >> were replaced with inode_is_locked, which unintentionally provides
> >> weaker guarantees.
> >>
> >> I don't see a rwsem helper for wlock check and I don't think it is all
> >> that beneficial to add. Instead, how about a bunch of lockdep, like so:
> >> diff --git a/fs/attr.c b/fs/attr.c
> >> index a8ae5f6d9b16..f47e718766d1 100644
> >> --- a/fs/attr.c
> >> +++ b/fs/attr.c
> >> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct
> >> dentry *dentry,
> >>         struct timespec64 now;
> >>         unsigned int ia_valid = attr->ia_valid;
> >>
> >> -       WARN_ON_ONCE(!inode_is_locked(inode));
> >> +       lockdep_assert_held_write(&inode->i_rwsem);
> >>
> >>         error = may_setattr(idmap, inode, ia_valid);
> >>         if (error)
> >>
> >> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
> >> name.
> >
> > Better to do it like mmap_lock:
> >
> > static inline void mmap_assert_write_locked(struct mm_struct *mm)
> > {
> >         lockdep_assert_held_write(&mm->mmap_lock);
> >         VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> > }
> >
> 
> May I suggest continuing this with responses to the patch I sent? ;)

That's annoying.  Don't split this kind of conversation up if you don't
have to.

> [RFC PATCH] vfs: add inode lockdep assertions on -fsdevel
> 
> I made it line up with asserts for i_mmap_rwsem.
> 
> btw your non-lockdep check suffers the very problem I'm trying to fix
> here -- checking for *any* locked state.

I'll respond to this over there then.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 0/2] Use exclusive lock for file_remove_privs
@ 2023-08-31 11:24 Bernd Schubert
  2023-08-31 11:24 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert
  0 siblings, 1 reply; 13+ messages in thread
From: Bernd Schubert @ 2023-08-31 11:24 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Josef Bacik,
	linux-btrfs, Alexander Viro, Christian Brauner

While adding shared direct IO write locks to fuse Miklos noticed
that file_remove_privs() needs an exclusive lock. I then
noticed that btrfs actually has the same issue as I had in my patch,
it was calling into that function with a shared lock.
This series adds a new exported function file_needs_remove_privs(),
which used by the follow up btrfs patch and will be used by the
DIO code path in fuse as well. If that function returns any mask
the shared lock needs to be dropped and replaced by the exclusive
variant.

Note: Compilation tested only.

v2:
Already check for IS_NOSEC in btrfs_direct_write before the first
lock is taken.
Slight modification to make the code easier to read (boolean pointer
is passed to btrfs_write_check, instead of flags).

Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org

Bernd Schubert (2):
  fs: Add and export file_needs_remove_privs
  btrfs: file_remove_privs needs an exclusive lock

 fs/btrfs/file.c    | 37 +++++++++++++++++++++++++++++--------
 fs/inode.c         |  8 ++++++++
 include/linux/fs.h |  1 +
 3 files changed, 38 insertions(+), 8 deletions(-)

-- 
2.39.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock
  2023-08-31 11:24 [PATCH v2 " Bernd Schubert
@ 2023-08-31 11:24 ` Bernd Schubert
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2023-08-31 11:24 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Christoph Hellwig,
	Goldwyn Rodrigues, Chris Mason, Josef Bacik, David Sterba,
	linux-btrfs

file_remove_privs might call into notify_change(), which
requires to hold an exclusive lock.
In order to keep the shared lock for most IOs it now first
checks if privilege changes are needed, then switches to
the exclusive lock, rechecks and only then calls file_remove_privs.
This makes usage of the new exported function
file_needs_remove_privs().

The file_remove_privs code path is not optimized, under the
assumption that it would be a rare call (file_remove_privs
calls file_needs_remove_privs a 2nd time).

Fixes: e9adabb9712e ("btrfs: use shared lock for direct writes within EOF")
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
Cc: linux-btrfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/btrfs/file.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fd03e689a6be..3162ec245d57 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1125,7 +1125,7 @@ static void update_time_for_write(struct inode *inode)
 }
 
 static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
-			     size_t count)
+			     size_t count, bool *shared_lock)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
@@ -1145,9 +1145,17 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
 	    !(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
 		return -EAGAIN;
 
-	ret = file_remove_privs(file);
-	if (ret)
-		return ret;
+	ret = file_needs_remove_privs(file);
+	if (ret) {
+		if (shared_lock && *shared_lock) {
+			*shared_lock = false;
+			return -EAGAIN;
+		}
+
+		ret = file_remove_privs(file);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * We reserve space for updating the inode when we reserve space for the
@@ -1204,7 +1212,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	if (ret <= 0)
 		goto out;
 
-	ret = btrfs_write_check(iocb, i, ret);
+	ret = btrfs_write_check(iocb, i, ret, NULL);
 	if (ret < 0)
 		goto out;
 
@@ -1462,13 +1470,20 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	ssize_t err;
 	unsigned int ilock_flags = 0;
 	struct iomap_dio *dio;
+	bool shared_lock;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		ilock_flags |= BTRFS_ILOCK_TRY;
 
-	/* If the write DIO is within EOF, use a shared lock */
-	if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode))
+	/* If the write DIO is within EOF, use a shared lock and also only
+	 * if security bits will likely not be dropped. Either will need
+	 * to be rechecked after the lock was acquired.
+	 */
+	if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode) &&
+	    IS_NOSEC(inode)) {
 		ilock_flags |= BTRFS_ILOCK_SHARED;
+		shared_lock = true;
+	}
 
 relock:
 	err = btrfs_inode_lock(BTRFS_I(inode), ilock_flags);
@@ -1481,8 +1496,15 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 		return err;
 	}
 
-	err = btrfs_write_check(iocb, from, err);
+	err = btrfs_write_check(iocb, from, err, &shared_lock);
 	if (err < 0) {
+		if (err == -EAGAIN && ilock_flags & BTRFS_ILOCK_SHARED &&
+		    !shared_lock) {
+			btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
+			ilock_flags &= ~BTRFS_ILOCK_SHARED;
+			goto relock;
+		}
+
 		btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
 		goto out;
 	}
@@ -1496,6 +1518,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	    pos + iov_iter_count(from) > i_size_read(inode)) {
 		btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
 		ilock_flags &= ~BTRFS_ILOCK_SHARED;
+		shared_lock = false;
 		goto relock;
 	}
 
@@ -1632,7 +1655,7 @@ static ssize_t btrfs_encoded_write(struct kiocb *iocb, struct iov_iter *from,
 	if (ret || encoded->len == 0)
 		goto out;
 
-	ret = btrfs_write_check(iocb, from, encoded->len);
+	ret = btrfs_write_check(iocb, from, encoded->len, NULL);
 	if (ret < 0)
 		goto out;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-09-06 15:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert
2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert
2023-09-01  6:48   ` Christoph Hellwig
2023-08-30 18:15 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert
2023-08-31  7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig
2023-08-31 10:17   ` Bernd Schubert
2023-08-31 10:18 ` Mateusz Guzik
2023-08-31 14:41   ` Bernd Schubert
2023-09-01  6:49   ` Christoph Hellwig
2023-09-01 11:38   ` Matthew Wilcox
2023-09-01 11:47     ` Mateusz Guzik
2023-09-06 15:13       ` Matthew Wilcox
  -- strict thread matches above, loose matches on Subject: below --
2023-08-31 11:24 [PATCH v2 " Bernd Schubert
2023-08-31 11:24 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox