* [PATCH 0/2] Use exclusive lock for file_remove_privs
@ 2023-08-30 18:15 Bernd Schubert
2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Bernd Schubert @ 2023-08-30 18:15 UTC (permalink / raw)
To: linux-fsdevel
Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Josef Bacik,
linux-btrfs, Alexander Viro, Christian Brauner
While adding shared direct IO write locks to fuse Miklos noticed
that file_remove_privs() needs an exclusive lock. I then
noticed that btrfs actually has the same issue as I had in my patch,
it was calling into that function with a shared lock.
This series adds a new exported function file_needs_remove_privs(),
which used by the follow up btrfs patch and will be used by the
DIO code path in fuse as well. If that function returns any mask
the shared lock needs to be dropped and replaced by the exclusive
variant.
Note: Compilation tested only.
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Bernd Schubert (2):
fs: Add and export file_needs_remove_privs
btrfs: file_remove_privs needs an exclusive lock
fs/btrfs/file.c | 37 +++++++++++++++++++++++++++++--------
fs/inode.c | 8 ++++++++
include/linux/fs.h | 1 +
3 files changed, 38 insertions(+), 8 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/2] fs: Add and export file_needs_remove_privs 2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert @ 2023-08-30 18:15 ` Bernd Schubert 2023-09-01 6:48 ` Christoph Hellwig 2023-08-30 18:15 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Bernd Schubert @ 2023-08-30 18:15 UTC (permalink / raw) To: linux-fsdevel Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner File systems want to hold a shared lock for DIO writes, but may need to drop file priveliges - that a requires an exclusive lock. The new export function file_needs_remove_privs() is added in order to first check if that is needed. Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Dharmendra Singh <dsingh@ddn.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: linux-btrfs@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Bernd Schubert <bschubert@ddn.com> --- fs/inode.c | 8 ++++++++ include/linux/fs.h | 1 + 2 files changed, 9 insertions(+) diff --git a/fs/inode.c b/fs/inode.c index 67611a360031..9b05db602e41 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2013,6 +2013,14 @@ int dentry_needs_remove_privs(struct mnt_idmap *idmap, return mask; } +int file_needs_remove_privs(struct file *file) +{ + struct dentry *dentry = file_dentry(file); + + return dentry_needs_remove_privs(file_mnt_idmap(file), dentry); +} +EXPORT_SYMBOL_GPL(file_needs_remove_privs); + static int __remove_privs(struct mnt_idmap *idmap, struct dentry *dentry, int kill) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 562f2623c9c9..9245f0de00bc 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2721,6 +2721,7 @@ extern struct inode *new_inode_pseudo(struct super_block *sb); extern struct inode *new_inode(struct super_block *sb); extern void free_inode_nonrcu(struct inode *inode); extern int setattr_should_drop_suidgid(struct mnt_idmap *, struct inode *); +int file_needs_remove_privs(struct file *); extern int file_remove_privs(struct file *); int setattr_should_drop_sgid(struct mnt_idmap *idmap, const struct inode *inode); -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] fs: Add and export file_needs_remove_privs 2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert @ 2023-09-01 6:48 ` Christoph Hellwig 0 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2023-09-01 6:48 UTC (permalink / raw) To: Bernd Schubert Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On Wed, Aug 30, 2023 at 08:15:18PM +0200, Bernd Schubert wrote: > File systems want to hold a shared lock for DIO writes, > but may need to drop file priveliges - that a requires an > exclusive lock. The new export function file_needs_remove_privs() > is added in order to first check if that is needed. As said last time - the existing file systems with shared locking for direct I/O just do the much more pessimistic IS_SEC check here. I'd suggest to just do that for btrfs first step. If we then have numbers justifying the finer grained check we should update veryone and not just do it for one place. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock 2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert 2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert @ 2023-08-30 18:15 ` Bernd Schubert 2023-08-31 7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig 2023-08-31 10:18 ` Mateusz Guzik 3 siblings, 0 replies; 13+ messages in thread From: Bernd Schubert @ 2023-08-30 18:15 UTC (permalink / raw) To: linux-fsdevel Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Goldwyn Rodrigues, Chris Mason, Josef Bacik, David Sterba, linux-btrfs file_remove_privs might call into notify_change(), which requires to hold an exclusive lock. In order to keep the shared lock for most IOs it now first checks if privilege changes are needed, then switches to the exclusive lock, rechecks and only then calls file_remove_privs. This makes usage of the new exported function file_needs_remove_privs(). The file_remove_privs code path is not optimized, under the assumption that it would be a rare call (file_remove_privs calls file_needs_remove_privs a 2nd time). Fixes: e9adabb9712e ("btrfs: use shared lock for direct writes within EOF") Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Dharmendra Singh <dsingh@ddn.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: David Sterba <dsterba@suse.com> Cc: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Bernd Schubert <bschubert@ddn.com> --- fs/btrfs/file.c | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fd03e689a6be..89c869ab131d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1125,7 +1125,7 @@ static void update_time_for_write(struct inode *inode) } static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, - size_t count) + size_t count, unsigned int *ilock_flags) { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); @@ -1145,9 +1145,17 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, !(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC))) return -EAGAIN; - ret = file_remove_privs(file); - if (ret) - return ret; + ret = file_needs_remove_privs(file); + if (ret) { + if (ilock_flags && *ilock_flags & BTRFS_ILOCK_SHARED) { + *ilock_flags &= ~BTRFS_ILOCK_SHARED; + return -EAGAIN; + } + + ret = file_remove_privs(file); + if (ret) + return ret; + } /* * We reserve space for updating the inode when we reserve space for the @@ -1204,7 +1212,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, if (ret <= 0) goto out; - ret = btrfs_write_check(iocb, i, ret); + ret = btrfs_write_check(iocb, i, ret, NULL); if (ret < 0) goto out; @@ -1462,13 +1470,16 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) ssize_t err; unsigned int ilock_flags = 0; struct iomap_dio *dio; + bool has_shared_lock; if (iocb->ki_flags & IOCB_NOWAIT) ilock_flags |= BTRFS_ILOCK_TRY; /* If the write DIO is within EOF, use a shared lock */ - if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode)) + if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode)) { ilock_flags |= BTRFS_ILOCK_SHARED; + has_shared_lock = true; + } relock: err = btrfs_inode_lock(BTRFS_I(inode), ilock_flags); @@ -1481,8 +1492,17 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) return err; } - err = btrfs_write_check(iocb, from, err); + /* might uset BTRFS_ILOCK_SHARED */ + err = btrfs_write_check(iocb, from, err, &ilock_flags); if (err < 0) { + if (err == -EAGAIN && has_shared_lock && + !(ilock_flags & BTRFS_ILOCK_SHARED)) { + btrfs_inode_unlock(BTRFS_I(inode), + ilock_flags | BTRFS_ILOCK_SHARED); + has_shared_lock = false; + goto relock; + } + btrfs_inode_unlock(BTRFS_I(inode), ilock_flags); goto out; } @@ -1496,6 +1516,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) pos + iov_iter_count(from) > i_size_read(inode)) { btrfs_inode_unlock(BTRFS_I(inode), ilock_flags); ilock_flags &= ~BTRFS_ILOCK_SHARED; + has_shared_lock = false; goto relock; } @@ -1632,7 +1653,7 @@ static ssize_t btrfs_encoded_write(struct kiocb *iocb, struct iov_iter *from, if (ret || encoded->len == 0) goto out; - ret = btrfs_write_check(iocb, from, encoded->len); + ret = btrfs_write_check(iocb, from, encoded->len, NULL); if (ret < 0) goto out; -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert 2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert 2023-08-30 18:15 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert @ 2023-08-31 7:40 ` Christoph Hellwig 2023-08-31 10:17 ` Bernd Schubert 2023-08-31 10:18 ` Mateusz Guzik 3 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2023-08-31 7:40 UTC (permalink / raw) To: Bernd Schubert Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: > While adding shared direct IO write locks to fuse Miklos noticed > that file_remove_privs() needs an exclusive lock. I then > noticed that btrfs actually has the same issue as I had in my patch, > it was calling into that function with a shared lock. > This series adds a new exported function file_needs_remove_privs(), > which used by the follow up btrfs patch and will be used by the > DIO code path in fuse as well. If that function returns any mask > the shared lock needs to be dropped and replaced by the exclusive > variant. FYI, xfs and ext4 use a simple IS_NOSEC check for this. That has a lot more false positives, but also is a much faster check in the fast path. It might be worth benchmarking which way to go, but we should be consistent across file systems for the behavior. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-08-31 7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig @ 2023-08-31 10:17 ` Bernd Schubert 0 siblings, 0 replies; 13+ messages in thread From: Bernd Schubert @ 2023-08-31 10:17 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-fsdevel@vger.kernel.org, bernd.schubert@fastmail.fm, miklos@szeredi.hu, Dharmendra Singh, Josef Bacik, linux-btrfs@vger.kernel.org, Alexander Viro, Christian Brauner On 8/31/23 09:40, Christoph Hellwig wrote: > On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: >> While adding shared direct IO write locks to fuse Miklos noticed >> that file_remove_privs() needs an exclusive lock. I then >> noticed that btrfs actually has the same issue as I had in my patch, >> it was calling into that function with a shared lock. >> This series adds a new exported function file_needs_remove_privs(), >> which used by the follow up btrfs patch and will be used by the >> DIO code path in fuse as well. If that function returns any mask >> the shared lock needs to be dropped and replaced by the exclusive >> variant. > > FYI, xfs and ext4 use a simple IS_NOSEC check for this. That has > a lot more false positives, but also is a much faster check in the > fast path. It might be worth benchmarking which way to go, but > we should be consistent across file systems for the behavior. > Thanks, interesting! It is basically the same as my file_needs_remove_privs, as long as IS_NOSEC is set. I can remove the new function and export and to change to that check. Not sure if it is that much faster, but should keep the kernel a bit smaller. Thanks, Bernd ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert ` (2 preceding siblings ...) 2023-08-31 7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig @ 2023-08-31 10:18 ` Mateusz Guzik 2023-08-31 14:41 ` Bernd Schubert ` (2 more replies) 3 siblings, 3 replies; 13+ messages in thread From: Mateusz Guzik @ 2023-08-31 10:18 UTC (permalink / raw) To: Bernd Schubert Cc: linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: > While adding shared direct IO write locks to fuse Miklos noticed > that file_remove_privs() needs an exclusive lock. I then > noticed that btrfs actually has the same issue as I had in my patch, > it was calling into that function with a shared lock. > This series adds a new exported function file_needs_remove_privs(), > which used by the follow up btrfs patch and will be used by the > DIO code path in fuse as well. If that function returns any mask > the shared lock needs to be dropped and replaced by the exclusive > variant. > No comments on the patchset itself. So I figured an assert should be there on the write lock held, then the issue would have been automagically reported. Turns out notify_change has the following: WARN_ON_ONCE(!inode_is_locked(inode)); Which expands to: static inline int rwsem_is_locked(struct rw_semaphore *sem) { return atomic_long_read(&sem->count) != 0; } So it does check the lock, except it passes *any* locked state, including just readers. According to git blame this regressed from commit 5955102c9984 ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked were replaced with inode_is_locked, which unintentionally provides weaker guarantees. I don't see a rwsem helper for wlock check and I don't think it is all that beneficial to add. Instead, how about a bunch of lockdep, like so: diff --git a/fs/attr.c b/fs/attr.c index a8ae5f6d9b16..f47e718766d1 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry, struct timespec64 now; unsigned int ia_valid = attr->ia_valid; - WARN_ON_ONCE(!inode_is_locked(inode)); + lockdep_assert_held_write(&inode->i_rwsem); error = may_setattr(idmap, inode, ia_valid); if (error) Alternatively hide it behind inode_assert_is_wlocked() or whatever other name. I can do the churn if this sounds like a plan. ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-08-31 10:18 ` Mateusz Guzik @ 2023-08-31 14:41 ` Bernd Schubert 2023-09-01 6:49 ` Christoph Hellwig 2023-09-01 11:38 ` Matthew Wilcox 2 siblings, 0 replies; 13+ messages in thread From: Bernd Schubert @ 2023-08-31 14:41 UTC (permalink / raw) To: Mateusz Guzik, Bernd Schubert Cc: linux-fsdevel, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On 8/31/23 12:18, Mateusz Guzik wrote: > On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: >> While adding shared direct IO write locks to fuse Miklos noticed >> that file_remove_privs() needs an exclusive lock. I then >> noticed that btrfs actually has the same issue as I had in my patch, >> it was calling into that function with a shared lock. >> This series adds a new exported function file_needs_remove_privs(), >> which used by the follow up btrfs patch and will be used by the >> DIO code path in fuse as well. If that function returns any mask >> the shared lock needs to be dropped and replaced by the exclusive >> variant. >> > > No comments on the patchset itself. > > So I figured an assert should be there on the write lock held, then the > issue would have been automagically reported. > > Turns out notify_change has the following: > WARN_ON_ONCE(!inode_is_locked(inode)); > > Which expands to: > static inline int rwsem_is_locked(struct rw_semaphore *sem) > { > return atomic_long_read(&sem->count) != 0; > } > > So it does check the lock, except it passes *any* locked state, > including just readers. > > According to git blame this regressed from commit 5955102c9984 > ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > were replaced with inode_is_locked, which unintentionally provides > weaker guarantees. > > I don't see a rwsem helper for wlock check and I don't think it is all > that beneficial to add. Instead, how about a bunch of lockdep, like so: > diff --git a/fs/attr.c b/fs/attr.c > index a8ae5f6d9b16..f47e718766d1 100644 > --- a/fs/attr.c > +++ b/fs/attr.c > @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry, > struct timespec64 now; > unsigned int ia_valid = attr->ia_valid; > > - WARN_ON_ONCE(!inode_is_locked(inode)); > + lockdep_assert_held_write(&inode->i_rwsem); > > error = may_setattr(idmap, inode, ia_valid); > if (error) > > Alternatively hide it behind inode_assert_is_wlocked() or whatever other > name. > > I can do the churn if this sounds like a plan. > I guess that might help to discover these issues. Maybe keep the existing WARN_ON_ONCE, as it would annotate a missing lock, when lockdep is turned off? Another code path is file_modified() and I just wonder if there are more issues. btrfs_punch_hole() takes inode->i_mmap_lock. Although I guess that should be found by the existing WARN_ON_ONCE? Actually same in btrfs_fallocate(). Or does btrfs always have IS_NOSEC? Thanks, Bernd ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-08-31 10:18 ` Mateusz Guzik 2023-08-31 14:41 ` Bernd Schubert @ 2023-09-01 6:49 ` Christoph Hellwig 2023-09-01 11:38 ` Matthew Wilcox 2 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2023-09-01 6:49 UTC (permalink / raw) To: Mateusz Guzik Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: > Turns out notify_change has the following: > WARN_ON_ONCE(!inode_is_locked(inode)); > > Which expands to: > static inline int rwsem_is_locked(struct rw_semaphore *sem) > { > return atomic_long_read(&sem->count) != 0; > } > > So it does check the lock, except it passes *any* locked state, > including just readers. > > According to git blame this regressed from commit 5955102c9984 > ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > were replaced with inode_is_locked, which unintentionally provides > weaker guarantees. > > I don't see a rwsem helper for wlock check and I don't think it is all > that beneficial to add. Instead, how about a bunch of lockdep, like so: Yes, that's a good idea. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-08-31 10:18 ` Mateusz Guzik 2023-08-31 14:41 ` Bernd Schubert 2023-09-01 6:49 ` Christoph Hellwig @ 2023-09-01 11:38 ` Matthew Wilcox 2023-09-01 11:47 ` Mateusz Guzik 2 siblings, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2023-09-01 11:38 UTC (permalink / raw) To: Mateusz Guzik Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: > So I figured an assert should be there on the write lock held, then the > issue would have been automagically reported. > > Turns out notify_change has the following: > WARN_ON_ONCE(!inode_is_locked(inode)); > > Which expands to: > static inline int rwsem_is_locked(struct rw_semaphore *sem) > { > return atomic_long_read(&sem->count) != 0; > } > > So it does check the lock, except it passes *any* locked state, > including just readers. > > According to git blame this regressed from commit 5955102c9984 > ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > were replaced with inode_is_locked, which unintentionally provides > weaker guarantees. > > I don't see a rwsem helper for wlock check and I don't think it is all > that beneficial to add. Instead, how about a bunch of lockdep, like so: > diff --git a/fs/attr.c b/fs/attr.c > index a8ae5f6d9b16..f47e718766d1 100644 > --- a/fs/attr.c > +++ b/fs/attr.c > @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry, > struct timespec64 now; > unsigned int ia_valid = attr->ia_valid; > > - WARN_ON_ONCE(!inode_is_locked(inode)); > + lockdep_assert_held_write(&inode->i_rwsem); > > error = may_setattr(idmap, inode, ia_valid); > if (error) > > Alternatively hide it behind inode_assert_is_wlocked() or whatever other > name. Better to do it like mmap_lock: static inline void mmap_assert_write_locked(struct mm_struct *mm) { lockdep_assert_held_write(&mm->mmap_lock); VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-09-01 11:38 ` Matthew Wilcox @ 2023-09-01 11:47 ` Mateusz Guzik 2023-09-06 15:13 ` Matthew Wilcox 0 siblings, 1 reply; 13+ messages in thread From: Mateusz Guzik @ 2023-09-01 11:47 UTC (permalink / raw) To: Matthew Wilcox Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote: > On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: >> So I figured an assert should be there on the write lock held, then the >> issue would have been automagically reported. >> >> Turns out notify_change has the following: >> WARN_ON_ONCE(!inode_is_locked(inode)); >> >> Which expands to: >> static inline int rwsem_is_locked(struct rw_semaphore *sem) >> { >> return atomic_long_read(&sem->count) != 0; >> } >> >> So it does check the lock, except it passes *any* locked state, >> including just readers. >> >> According to git blame this regressed from commit 5955102c9984 >> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked >> were replaced with inode_is_locked, which unintentionally provides >> weaker guarantees. >> >> I don't see a rwsem helper for wlock check and I don't think it is all >> that beneficial to add. Instead, how about a bunch of lockdep, like so: >> diff --git a/fs/attr.c b/fs/attr.c >> index a8ae5f6d9b16..f47e718766d1 100644 >> --- a/fs/attr.c >> +++ b/fs/attr.c >> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct >> dentry *dentry, >> struct timespec64 now; >> unsigned int ia_valid = attr->ia_valid; >> >> - WARN_ON_ONCE(!inode_is_locked(inode)); >> + lockdep_assert_held_write(&inode->i_rwsem); >> >> error = may_setattr(idmap, inode, ia_valid); >> if (error) >> >> Alternatively hide it behind inode_assert_is_wlocked() or whatever other >> name. > > Better to do it like mmap_lock: > > static inline void mmap_assert_write_locked(struct mm_struct *mm) > { > lockdep_assert_held_write(&mm->mmap_lock); > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > } > May I suggest continuing this with responses to the patch I sent? ;) [RFC PATCH] vfs: add inode lockdep assertions on -fsdevel I made it line up with asserts for i_mmap_rwsem. btw your non-lockdep check suffers the very problem I'm trying to fix here -- checking for *any* locked state. -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Use exclusive lock for file_remove_privs 2023-09-01 11:47 ` Mateusz Guzik @ 2023-09-06 15:13 ` Matthew Wilcox 0 siblings, 0 replies; 13+ messages in thread From: Matthew Wilcox @ 2023-09-06 15:13 UTC (permalink / raw) To: Mateusz Guzik Cc: Bernd Schubert, linux-fsdevel, bernd.schubert, miklos, dsingh, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner On Fri, Sep 01, 2023 at 01:47:10PM +0200, Mateusz Guzik wrote: > On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: > >> So I figured an assert should be there on the write lock held, then the > >> issue would have been automagically reported. > >> > >> Turns out notify_change has the following: > >> WARN_ON_ONCE(!inode_is_locked(inode)); > >> > >> Which expands to: > >> static inline int rwsem_is_locked(struct rw_semaphore *sem) > >> { > >> return atomic_long_read(&sem->count) != 0; > >> } > >> > >> So it does check the lock, except it passes *any* locked state, > >> including just readers. > >> > >> According to git blame this regressed from commit 5955102c9984 > >> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > >> were replaced with inode_is_locked, which unintentionally provides > >> weaker guarantees. > >> > >> I don't see a rwsem helper for wlock check and I don't think it is all > >> that beneficial to add. Instead, how about a bunch of lockdep, like so: > >> diff --git a/fs/attr.c b/fs/attr.c > >> index a8ae5f6d9b16..f47e718766d1 100644 > >> --- a/fs/attr.c > >> +++ b/fs/attr.c > >> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct > >> dentry *dentry, > >> struct timespec64 now; > >> unsigned int ia_valid = attr->ia_valid; > >> > >> - WARN_ON_ONCE(!inode_is_locked(inode)); > >> + lockdep_assert_held_write(&inode->i_rwsem); > >> > >> error = may_setattr(idmap, inode, ia_valid); > >> if (error) > >> > >> Alternatively hide it behind inode_assert_is_wlocked() or whatever other > >> name. > > > > Better to do it like mmap_lock: > > > > static inline void mmap_assert_write_locked(struct mm_struct *mm) > > { > > lockdep_assert_held_write(&mm->mmap_lock); > > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > > } > > > > May I suggest continuing this with responses to the patch I sent? ;) That's annoying. Don't split this kind of conversation up if you don't have to. > [RFC PATCH] vfs: add inode lockdep assertions on -fsdevel > > I made it line up with asserts for i_mmap_rwsem. > > btw your non-lockdep check suffers the very problem I'm trying to fix > here -- checking for *any* locked state. I'll respond to this over there then. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 0/2] Use exclusive lock for file_remove_privs @ 2023-08-31 11:24 Bernd Schubert 2023-08-31 11:24 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert 0 siblings, 1 reply; 13+ messages in thread From: Bernd Schubert @ 2023-08-31 11:24 UTC (permalink / raw) To: linux-fsdevel Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Josef Bacik, linux-btrfs, Alexander Viro, Christian Brauner While adding shared direct IO write locks to fuse Miklos noticed that file_remove_privs() needs an exclusive lock. I then noticed that btrfs actually has the same issue as I had in my patch, it was calling into that function with a shared lock. This series adds a new exported function file_needs_remove_privs(), which used by the follow up btrfs patch and will be used by the DIO code path in fuse as well. If that function returns any mask the shared lock needs to be dropped and replaced by the exclusive variant. Note: Compilation tested only. v2: Already check for IS_NOSEC in btrfs_direct_write before the first lock is taken. Slight modification to make the code easier to read (boolean pointer is passed to btrfs_write_check, instead of flags). Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Dharmendra Singh <dsingh@ddn.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: linux-btrfs@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Bernd Schubert (2): fs: Add and export file_needs_remove_privs btrfs: file_remove_privs needs an exclusive lock fs/btrfs/file.c | 37 +++++++++++++++++++++++++++++-------- fs/inode.c | 8 ++++++++ include/linux/fs.h | 1 + 3 files changed, 38 insertions(+), 8 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock 2023-08-31 11:24 [PATCH v2 " Bernd Schubert @ 2023-08-31 11:24 ` Bernd Schubert 0 siblings, 0 replies; 13+ messages in thread From: Bernd Schubert @ 2023-08-31 11:24 UTC (permalink / raw) To: linux-fsdevel Cc: bernd.schubert, miklos, dsingh, Bernd Schubert, Christoph Hellwig, Goldwyn Rodrigues, Chris Mason, Josef Bacik, David Sterba, linux-btrfs file_remove_privs might call into notify_change(), which requires to hold an exclusive lock. In order to keep the shared lock for most IOs it now first checks if privilege changes are needed, then switches to the exclusive lock, rechecks and only then calls file_remove_privs. This makes usage of the new exported function file_needs_remove_privs(). The file_remove_privs code path is not optimized, under the assumption that it would be a rare call (file_remove_privs calls file_needs_remove_privs a 2nd time). Fixes: e9adabb9712e ("btrfs: use shared lock for direct writes within EOF") Cc: Christoph Hellwig <hch@infradead.org> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Dharmendra Singh <dsingh@ddn.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: David Sterba <dsterba@suse.com> Cc: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Bernd Schubert <bschubert@ddn.com> --- fs/btrfs/file.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fd03e689a6be..3162ec245d57 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1125,7 +1125,7 @@ static void update_time_for_write(struct inode *inode) } static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, - size_t count) + size_t count, bool *shared_lock) { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); @@ -1145,9 +1145,17 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, !(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC))) return -EAGAIN; - ret = file_remove_privs(file); - if (ret) - return ret; + ret = file_needs_remove_privs(file); + if (ret) { + if (shared_lock && *shared_lock) { + *shared_lock = false; + return -EAGAIN; + } + + ret = file_remove_privs(file); + if (ret) + return ret; + } /* * We reserve space for updating the inode when we reserve space for the @@ -1204,7 +1212,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, if (ret <= 0) goto out; - ret = btrfs_write_check(iocb, i, ret); + ret = btrfs_write_check(iocb, i, ret, NULL); if (ret < 0) goto out; @@ -1462,13 +1470,20 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) ssize_t err; unsigned int ilock_flags = 0; struct iomap_dio *dio; + bool shared_lock; if (iocb->ki_flags & IOCB_NOWAIT) ilock_flags |= BTRFS_ILOCK_TRY; - /* If the write DIO is within EOF, use a shared lock */ - if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode)) + /* If the write DIO is within EOF, use a shared lock and also only + * if security bits will likely not be dropped. Either will need + * to be rechecked after the lock was acquired. + */ + if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode) && + IS_NOSEC(inode)) { ilock_flags |= BTRFS_ILOCK_SHARED; + shared_lock = true; + } relock: err = btrfs_inode_lock(BTRFS_I(inode), ilock_flags); @@ -1481,8 +1496,15 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) return err; } - err = btrfs_write_check(iocb, from, err); + err = btrfs_write_check(iocb, from, err, &shared_lock); if (err < 0) { + if (err == -EAGAIN && ilock_flags & BTRFS_ILOCK_SHARED && + !shared_lock) { + btrfs_inode_unlock(BTRFS_I(inode), ilock_flags); + ilock_flags &= ~BTRFS_ILOCK_SHARED; + goto relock; + } + btrfs_inode_unlock(BTRFS_I(inode), ilock_flags); goto out; } @@ -1496,6 +1518,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) pos + iov_iter_count(from) > i_size_read(inode)) { btrfs_inode_unlock(BTRFS_I(inode), ilock_flags); ilock_flags &= ~BTRFS_ILOCK_SHARED; + shared_lock = false; goto relock; } @@ -1632,7 +1655,7 @@ static ssize_t btrfs_encoded_write(struct kiocb *iocb, struct iov_iter *from, if (ret || encoded->len == 0) goto out; - ret = btrfs_write_check(iocb, from, encoded->len); + ret = btrfs_write_check(iocb, from, encoded->len, NULL); if (ret < 0) goto out; -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-09-06 15:13 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-30 18:15 [PATCH 0/2] Use exclusive lock for file_remove_privs Bernd Schubert 2023-08-30 18:15 ` [PATCH 1/2] fs: Add and export file_needs_remove_privs Bernd Schubert 2023-09-01 6:48 ` Christoph Hellwig 2023-08-30 18:15 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert 2023-08-31 7:40 ` [PATCH 0/2] Use exclusive lock for file_remove_privs Christoph Hellwig 2023-08-31 10:17 ` Bernd Schubert 2023-08-31 10:18 ` Mateusz Guzik 2023-08-31 14:41 ` Bernd Schubert 2023-09-01 6:49 ` Christoph Hellwig 2023-09-01 11:38 ` Matthew Wilcox 2023-09-01 11:47 ` Mateusz Guzik 2023-09-06 15:13 ` Matthew Wilcox -- strict thread matches above, loose matches on Subject: below -- 2023-08-31 11:24 [PATCH v2 " Bernd Schubert 2023-08-31 11:24 ` [PATCH 2/2] btrfs: file_remove_privs needs an exclusive lock Bernd Schubert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox