* [PATCH v2 00/16] Tidy up file permission hooks
@ 2023-11-22 12:26 Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct() Amir Goldstein
` (17 more replies)
0 siblings, 18 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:26 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
Hi Christian,
During my work on fanotify "pre content" events [1], Jan and I noticed
some inconsistencies in the call sites of security_file_permission()
hooks inside rw_verify_area() and remap_verify_area().
The majority of call sites are before file_start_write(), which is how
we want them to be for fanotify "pre content" events.
For splice code, there are many duplicate calls to rw_verify_area()
for the entire range as well as for partial ranges inside iterator.
This cleanup series, mostly following Jan's suggestions, moves all
the security_file_permission() hooks before file_start_write() and
eliminates duplicate permission hook calls in the same call chain.
The last 3 patches are helpers that I used in fanotify patches to
assert that permission hooks are called with expected locking scope.
Please stage this work on a stable branch in the vfs tree, so that
I will be able to send Jan fanotify patches for "pre content" events
based on the stable vfs branch.
Thanks,
Amir.
Changes since v1:
- Split coda locking reorder patch (jaharkes)
- Fix vfs_iocb_iter_write() file_end_write() bug (Josef)
- Fix subtle allow_file_dedupe() bug (+rename) (brauner)
- Fix some minor review nits (brauner)
- Added RVB from Josef and Chuck
[1] https://github.com/amir73il/linux/commits/fan_pre_content
Amir Goldstein (16):
ovl: add permission hooks outside of do_splice_direct()
splice: remove permission hook from do_splice_direct()
splice: move permission hook out of splice_direct_to_actor()
splice: move permission hook out of splice_file_to_pipe()
splice: remove permission hook from iter_file_splice_write()
remap_range: move permission hooks out of do_clone_file_range()
remap_range: move file_start_write() to after permission hook
btrfs: move file_start_write() to after permission hook
coda: change locking order in coda_file_write_iter()
fs: move file_start_write() into vfs_iter_write()
fs: move permission hook out of do_iter_write()
fs: move permission hook out of do_iter_read()
fs: move kiocb_start_write() into vfs_iocb_iter_write()
fs: create __sb_write_started() helper
fs: create file_write_started() helper
fs: create {sb,file}_write_not_started() helpers
drivers/block/loop.c | 2 -
fs/btrfs/ioctl.c | 12 +--
fs/cachefiles/io.c | 5 +-
fs/coda/file.c | 2 -
fs/internal.h | 8 +-
fs/nfsd/vfs.c | 7 +-
fs/overlayfs/copy_up.c | 26 +++++-
fs/overlayfs/file.c | 10 +--
fs/read_write.c | 177 ++++++++++++++++++++++++++++-------------
fs/remap_range.c | 37 +++++----
fs/splice.c | 78 ++++++++++--------
include/linux/fs.h | 62 ++++++++++++++-
12 files changed, 297 insertions(+), 129 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:35 ` Christoph Hellwig
2023-11-23 16:28 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 02/16] splice: remove permission hook from do_splice_direct() Amir Goldstein
` (16 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
The main callers of do_splice_direct() also call rw_verify_area() for
the entire range that is being copied, e.g. by vfs_copy_file_range()
or do_sendfile() before calling do_splice_direct().
The only caller that does not have those checks for entire range is
ovl_copy_up_file(). In preparation for removing the checks inside
do_splice_direct(), add rw_verify_area() call in ovl_copy_up_file().
For extra safety, perform minimal sanity checks from rw_verify_area()
for non negative offsets also in the copy up do_splice_direct() loop
without calling the file permission hooks.
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/copy_up.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 4382881b0709..106f8643af3b 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -230,6 +230,19 @@ static int ovl_copy_fileattr(struct inode *inode, const struct path *old,
return ovl_real_fileattr_set(new, &newfa);
}
+static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
+{
+ loff_t tmp;
+
+ if (WARN_ON_ONCE(pos != pos2))
+ return -EIO;
+ if (WARN_ON_ONCE(pos < 0 || len < 0 || totlen < 0))
+ return -EIO;
+ if (WARN_ON_ONCE(check_add_overflow(pos, len, &tmp)))
+ return -EIO;
+ return 0;
+}
+
static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
struct file *new_file, loff_t len)
{
@@ -244,13 +257,20 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
int error = 0;
ovl_path_lowerdata(dentry, &datapath);
- if (WARN_ON(datapath.dentry == NULL))
+ if (WARN_ON_ONCE(datapath.dentry == NULL) ||
+ WARN_ON_ONCE(len < 0))
return -EIO;
old_file = ovl_path_open(&datapath, O_LARGEFILE | O_RDONLY);
if (IS_ERR(old_file))
return PTR_ERR(old_file);
+ error = rw_verify_area(READ, old_file, &old_pos, len);
+ if (!error)
+ error = rw_verify_area(WRITE, new_file, &new_pos, len);
+ if (error)
+ goto out_fput;
+
/* Try to use clone_file_range to clone up within the same fs */
ovl_start_write(dentry);
cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0);
@@ -309,6 +329,10 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
}
}
+ error = ovl_verify_area(old_pos, new_pos, this_len, len);
+ if (error)
+ break;
+
ovl_start_write(dentry);
bytes = do_splice_direct(old_file, &old_pos,
new_file, &new_pos,
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 02/16] splice: remove permission hook from do_splice_direct()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:36 ` Christoph Hellwig
2023-11-23 16:28 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor() Amir Goldstein
` (15 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
All callers of do_splice_direct() have a call to rw_verify_area() for
the entire range that is being copied, e.g. by vfs_copy_file_range() or
do_sendfile() before calling do_splice_direct().
The rw_verify_area() check inside do_splice_direct() is redundant and
is called after sb_start_write(), so it is not "start-write-safe".
Remove this redundant check.
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/splice.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index d983d375ff11..6e917db6f49a 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1166,6 +1166,7 @@ static void direct_file_splice_eof(struct splice_desc *sd)
* (splice in + splice out, as compared to just sendfile()). So this helper
* can splice directly through a process-private pipe.
*
+ * Callers already called rw_verify_area() on the entire range.
*/
long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
loff_t *opos, size_t len, unsigned int flags)
@@ -1187,10 +1188,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
if (unlikely(out->f_flags & O_APPEND))
return -EINVAL;
- ret = rw_verify_area(WRITE, out, opos, len);
- if (unlikely(ret < 0))
- return ret;
-
ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
if (ret > 0)
*ppos = sd.pos;
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct() Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 02/16] splice: remove permission hook from do_splice_direct() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:36 ` Christoph Hellwig
2023-11-23 16:35 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe() Amir Goldstein
` (14 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel, Chuck Lever
vfs_splice_read() has a permission hook inside rw_verify_area() and
it is called from do_splice_direct() -> splice_direct_to_actor().
The callers of do_splice_direct() (e.g. vfs_copy_file_range()) already
call rw_verify_area() for the entire range, but the other caller of
splice_direct_to_actor() (nfsd) does not.
Add the rw_verify_area() checks in nfsd_splice_read() and use a
variant of vfs_splice_read() without rw_verify_area() check in
splice_direct_to_actor() to avoid the redundant rw_verify_area() checks.
This is needed for fanotify "pre content" events.
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/nfsd/vfs.c | 5 ++++-
fs/splice.c | 58 +++++++++++++++++++++++++++++++--------------------
2 files changed, 39 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fbbea7498f02..5d704461e3b4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1046,7 +1046,10 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
ssize_t host_err;
trace_nfsd_read_splice(rqstp, fhp, offset, *count);
- host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
+ host_err = rw_verify_area(READ, file, &offset, *count);
+ if (!host_err)
+ host_err = splice_direct_to_actor(file, &sd,
+ nfsd_direct_splice_actor);
return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
}
diff --git a/fs/splice.c b/fs/splice.c
index 6e917db6f49a..6fc2c27e9520 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -944,27 +944,15 @@ static void do_splice_eof(struct splice_desc *sd)
sd->splice_eof(sd);
}
-/**
- * vfs_splice_read - Read data from a file and splice it into a pipe
- * @in: File to splice from
- * @ppos: Input file offset
- * @pipe: Pipe to splice to
- * @len: Number of bytes to splice
- * @flags: Splice modifier flags (SPLICE_F_*)
- *
- * Splice the requested amount of data from the input file to the pipe. This
- * is synchronous as the caller must hold the pipe lock across the entire
- * operation.
- *
- * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
- * a hole and a negative error code otherwise.
+/*
+ * Callers already called rw_verify_area() on the entire range.
+ * No need to call it for sub ranges.
*/
-long vfs_splice_read(struct file *in, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
- unsigned int flags)
+static long do_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
{
unsigned int p_space;
- int ret;
if (unlikely(!(in->f_mode & FMODE_READ)))
return -EBADF;
@@ -975,10 +963,6 @@ long vfs_splice_read(struct file *in, loff_t *ppos,
p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail);
len = min_t(size_t, len, p_space << PAGE_SHIFT);
- ret = rw_verify_area(READ, in, ppos, len);
- if (unlikely(ret < 0))
- return ret;
-
if (unlikely(len > MAX_RW_COUNT))
len = MAX_RW_COUNT;
@@ -992,6 +976,34 @@ long vfs_splice_read(struct file *in, loff_t *ppos,
return copy_splice_read(in, ppos, pipe, len, flags);
return in->f_op->splice_read(in, ppos, pipe, len, flags);
}
+
+/**
+ * vfs_splice_read - Read data from a file and splice it into a pipe
+ * @in: File to splice from
+ * @ppos: Input file offset
+ * @pipe: Pipe to splice to
+ * @len: Number of bytes to splice
+ * @flags: Splice modifier flags (SPLICE_F_*)
+ *
+ * Splice the requested amount of data from the input file to the pipe. This
+ * is synchronous as the caller must hold the pipe lock across the entire
+ * operation.
+ *
+ * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
+ * a hole and a negative error code otherwise.
+ */
+long vfs_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ int ret;
+
+ ret = rw_verify_area(READ, in, ppos, len);
+ if (unlikely(ret < 0))
+ return ret;
+
+ return do_splice_read(in, ppos, pipe, len, flags);
+}
EXPORT_SYMBOL_GPL(vfs_splice_read);
/**
@@ -1066,7 +1078,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
size_t read_len;
loff_t pos = sd->pos, prev_pos = pos;
- ret = vfs_splice_read(in, &pos, pipe, len, flags);
+ ret = do_splice_read(in, &pos, pipe, len, flags);
if (unlikely(ret <= 0))
goto read_failure;
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (2 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:38 ` Christoph Hellwig
2023-11-23 16:37 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write() Amir Goldstein
` (13 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
vfs_splice_read() has a permission hook inside rw_verify_area() and
it is called from splice_file_to_pipe(), which is called from
do_splice() and do_sendfile().
do_sendfile() already has a rw_verify_area() check for the entire range.
do_splice() has a rw_verify_check() for the splice to file case, not for
the splice from file case.
Add the rw_verify_area() check for splice from file case in do_splice()
and use a variant of vfs_splice_read() without rw_verify_area() check
in splice_file_to_pipe() to avoid the redundant rw_verify_area() checks.
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/splice.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/splice.c b/fs/splice.c
index 6fc2c27e9520..d4fdd44c0b32 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1239,7 +1239,7 @@ long splice_file_to_pipe(struct file *in,
pipe_lock(opipe);
ret = wait_for_space(opipe, flags);
if (!ret)
- ret = vfs_splice_read(in, offset, opipe, len, flags);
+ ret = do_splice_read(in, offset, opipe, len, flags);
pipe_unlock(opipe);
if (ret > 0)
wakeup_pipe_readers(opipe);
@@ -1316,6 +1316,10 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out,
offset = in->f_pos;
}
+ ret = rw_verify_area(READ, in, &offset, len);
+ if (unlikely(ret < 0))
+ return ret;
+
if (out->f_flags & O_NONBLOCK)
flags |= SPLICE_F_NONBLOCK;
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (3 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:47 ` Christoph Hellwig
2023-11-22 12:27 ` [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range() Amir Goldstein
` (12 subsequent siblings)
17 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
All the callers of ->splice_write(), (e.g. do_splice_direct() and
do_splice()) already check rw_verify_area() for the entire range
and perform all the other checks that are in vfs_write_iter().
Create a helper do_iter_writev(), that performs the write without the
checks and use it in iter_file_splice_write() to avoid the redundant
rw_verify_area() checks.
This is needed for fanotify "pre content" events.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/internal.h | 8 +++++++-
fs/read_write.c | 11 +++++++++++
fs/splice.c | 9 ++++++---
3 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/fs/internal.h b/fs/internal.h
index 58e43341aebf..c114b85e27a7 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -298,7 +298,13 @@ static inline ssize_t do_get_acl(struct mnt_idmap *idmap,
}
#endif
-ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *pos);
+/*
+ * fs/read_write.c
+ */
+ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from,
+ loff_t *pos);
+ssize_t do_iter_writev(struct file *file, struct iov_iter *iter, loff_t *ppos,
+ rwf_t flags);
/*
* fs/attr.c
diff --git a/fs/read_write.c b/fs/read_write.c
index 4771701c896b..313f7eaaa9a7 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -739,6 +739,17 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
return ret;
}
+/*
+ * Low-level helpers don't perform rw sanity checks.
+ * The caller is responsible for that.
+ */
+ssize_t do_iter_writev(struct file *filp, struct iov_iter *iter, loff_t *ppos,
+ rwf_t flags)
+{
+ return do_iter_readv_writev(filp, iter, ppos, WRITE, flags);
+}
+
+
/* Do it by hand, with file-ops */
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
loff_t *ppos, int type, rwf_t flags)
diff --git a/fs/splice.c b/fs/splice.c
index d4fdd44c0b32..decbace5d812 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -673,10 +673,13 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
.u.file = out,
};
int nbufs = pipe->max_usage;
- struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
- GFP_KERNEL);
+ struct bio_vec *array;
ssize_t ret;
+ if (!out->f_op->write_iter)
+ return -EINVAL;
+
+ array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL);
if (unlikely(!array))
return -ENOMEM;
@@ -733,7 +736,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
}
iov_iter_bvec(&from, ITER_SOURCE, array, n, sd.total_len - left);
- ret = vfs_iter_write(out, &from, &sd.pos, 0);
+ ret = do_iter_writev(out, &from, &sd.pos, 0);
if (ret <= 0)
break;
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (4 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:47 ` Christoph Hellwig
2023-11-23 16:52 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook Amir Goldstein
` (11 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
In many of the vfs helpers, file permission hook is called before
taking sb_start_write(), making them "start-write-safe".
do_clone_file_range() is an exception to this rule.
do_clone_file_range() has two callers - vfs_clone_file_range() and
overlayfs. Move remap_verify_area() checks from do_clone_file_range()
out to vfs_clone_file_range() to make them "start-write-safe".
Overlayfs already has calls to rw_verify_area() with the same security
permission hooks as remap_verify_area() has.
The rest of the checks in remap_verify_area() are irrelevant for
overlayfs that calls do_clone_file_range() offset 0 and positive length.
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/remap_range.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 87ae4f0dc3aa..42f79cb2b1b1 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -385,14 +385,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
if (!file_in->f_op->remap_file_range)
return -EOPNOTSUPP;
- ret = remap_verify_area(file_in, pos_in, len, false);
- if (ret)
- return ret;
-
- ret = remap_verify_area(file_out, pos_out, len, true);
- if (ret)
- return ret;
-
ret = file_in->f_op->remap_file_range(file_in, pos_in,
file_out, pos_out, len, remap_flags);
if (ret < 0)
@@ -410,6 +402,14 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
{
loff_t ret;
+ ret = remap_verify_area(file_in, pos_in, len, false);
+ if (ret)
+ return ret;
+
+ ret = remap_verify_area(file_out, pos_out, len, true);
+ if (ret)
+ return ret;
+
file_start_write(file_out);
ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
remap_flags);
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (5 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:48 ` Christoph Hellwig
2023-11-23 16:53 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 08/16] btrfs: " Amir Goldstein
` (10 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
In vfs code, file_start_write() is usually called after the permission
hook in rw_verify_area(). vfs_dedupe_file_range_one() is an exception
to this rule.
In vfs_dedupe_file_range_one(), move file_start_write() to after the
the rw_verify_area() checks to make them "start-write-safe".
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/remap_range.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 42f79cb2b1b1..12131f2a6c9e 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -420,7 +420,7 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
EXPORT_SYMBOL(vfs_clone_file_range);
/* Check whether we are allowed to dedupe the destination file */
-static bool allow_file_dedupe(struct file *file)
+static bool may_dedupe_file(struct file *file)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct inode *inode = file_inode(file);
@@ -445,24 +445,29 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
WARN_ON_ONCE(remap_flags & ~(REMAP_FILE_DEDUP |
REMAP_FILE_CAN_SHORTEN));
- ret = mnt_want_write_file(dst_file);
- if (ret)
- return ret;
-
/*
* This is redundant if called from vfs_dedupe_file_range(), but other
* callers need it and it's not performance sesitive...
*/
ret = remap_verify_area(src_file, src_pos, len, false);
if (ret)
- goto out_drop_write;
+ return ret;
ret = remap_verify_area(dst_file, dst_pos, len, true);
if (ret)
- goto out_drop_write;
+ return ret;
+
+ /*
+ * This needs to be called after remap_verify_area() because of
+ * sb_start_write() and before may_dedupe_file() because the mount's
+ * MAY_WRITE need to be checked with mnt_get_write_access_file() held.
+ */
+ ret = mnt_want_write_file(dst_file);
+ if (ret)
+ return ret;
ret = -EPERM;
- if (!allow_file_dedupe(dst_file))
+ if (!may_dedupe_file(dst_file))
goto out_drop_write;
ret = -EXDEV;
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 08/16] btrfs: move file_start_write() to after permission hook
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (6 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:48 ` Christoph Hellwig
2023-11-23 16:54 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 09/16] coda: change locking order in coda_file_write_iter() Amir Goldstein
` (9 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
In vfs code, file_start_write() is usually called after the permission
hook in rw_verify_area(). btrfs_ioctl_encoded_write() in an exception
to this rule.
Move file_start_write() to after the rw_verify_area() check in encoded
write to make the permission hook "start-write-safe".
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/btrfs/ioctl.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dfe257e1845b..0a7850c4be67 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4523,29 +4523,29 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool
if (ret < 0)
goto out_acct;
- file_start_write(file);
-
if (iov_iter_count(&iter) == 0) {
ret = 0;
- goto out_end_write;
+ goto out_iov;
}
pos = args.offset;
ret = rw_verify_area(WRITE, file, &pos, args.len);
if (ret < 0)
- goto out_end_write;
+ goto out_iov;
init_sync_kiocb(&kiocb, file);
ret = kiocb_set_rw_flags(&kiocb, 0);
if (ret)
- goto out_end_write;
+ goto out_iov;
kiocb.ki_pos = pos;
+ file_start_write(file);
+
ret = btrfs_do_write_iter(&kiocb, &iter, &args);
if (ret > 0)
fsnotify_modify(file);
-out_end_write:
file_end_write(file);
+out_iov:
kfree(iov);
out_acct:
if (ret > 0)
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 09/16] coda: change locking order in coda_file_write_iter()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (7 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 08/16] btrfs: " Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 10/16] fs: move file_start_write() into vfs_iter_write() Amir Goldstein
` (8 subsequent siblings)
17 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel, Jan Harkes
The coda host file is a backing file for the coda inode on a different
filesystem than the coda inode.
Change the locking order to take the coda inode lock before taking
the backing host file freeze protection, same as in ovl_write_iter()
and in network filesystems that use cachefiles.
Link: https://lore.kernel.org/r/CAOQ4uxjcnwuF1gMxe64WLODGA_MyAy8x-DtqkCUxqVQKk3Xbng@mail.gmail.com/
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/coda/file.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 16acc58311ea..e62315c37386 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -79,14 +79,14 @@ coda_file_write_iter(struct kiocb *iocb, struct iov_iter *to)
if (ret)
goto finish_write;
- file_start_write(host_file);
inode_lock(coda_inode);
+ file_start_write(host_file);
ret = vfs_iter_write(cfi->cfi_container, to, &iocb->ki_pos, 0);
coda_inode->i_size = file_inode(host_file)->i_size;
coda_inode->i_blocks = (coda_inode->i_size + 511) >> 9;
inode_set_mtime_to_ts(coda_inode, inode_set_ctime_current(coda_inode));
- inode_unlock(coda_inode);
file_end_write(host_file);
+ inode_unlock(coda_inode);
finish_write:
venus_access_intent(coda_inode->i_sb, coda_i2f(coda_inode),
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 10/16] fs: move file_start_write() into vfs_iter_write()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (8 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 09/16] coda: change locking order in coda_file_write_iter() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 7:50 ` Christoph Hellwig
2023-11-22 12:27 ` [PATCH v2 11/16] fs: move permission hook out of do_iter_write() Amir Goldstein
` (7 subsequent siblings)
17 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
All the callers of vfs_iter_write() call file_start_write() just before
calling vfs_iter_write() except for target_core_file's fd_do_rw().
Move file_start_write() from the callers into vfs_iter_write().
fd_do_rw() calls vfs_iter_write() with a non-regular file, so
file_start_write() is a no-op.
This is needed for fanotify "pre content" events.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
drivers/block/loop.c | 2 --
fs/coda/file.c | 2 --
fs/nfsd/vfs.c | 2 --
fs/overlayfs/file.c | 2 --
fs/read_write.c | 13 ++++++++++---
5 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 9f2d412fc560..8a8cd4fc9238 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -245,9 +245,7 @@ static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
iov_iter_bvec(&i, ITER_SOURCE, bvec, 1, bvec->bv_len);
- file_start_write(file);
bw = vfs_iter_write(file, &i, ppos, 0);
- file_end_write(file);
if (likely(bw == bvec->bv_len))
return 0;
diff --git a/fs/coda/file.c b/fs/coda/file.c
index e62315c37386..148856a582a9 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -80,12 +80,10 @@ coda_file_write_iter(struct kiocb *iocb, struct iov_iter *to)
goto finish_write;
inode_lock(coda_inode);
- file_start_write(host_file);
ret = vfs_iter_write(cfi->cfi_container, to, &iocb->ki_pos, 0);
coda_inode->i_size = file_inode(host_file)->i_size;
coda_inode->i_blocks = (coda_inode->i_size + 511) >> 9;
inode_set_mtime_to_ts(coda_inode, inode_set_ctime_current(coda_inode));
- file_end_write(host_file);
inode_unlock(coda_inode);
finish_write:
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5d704461e3b4..35c9546b3396 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1186,9 +1186,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
since = READ_ONCE(file->f_wb_err);
if (verf)
nfsd_copy_write_verifier(verf, nn);
- file_start_write(file);
host_err = vfs_iter_write(file, &iter, &pos, flags);
- file_end_write(file);
if (host_err < 0) {
commit_reset_write_verifier(nn, rqstp, host_err);
goto out_nfserr;
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 131621daeb13..690b173f34fc 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -436,9 +436,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
if (is_sync_kiocb(iocb)) {
rwf_t rwf = iocb_to_rw_flags(ifl);
- file_start_write(real.file);
ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf);
- file_end_write(real.file);
/* Update size */
ovl_file_modified(file);
} else {
diff --git a/fs/read_write.c b/fs/read_write.c
index 313f7eaaa9a7..87ca50f16a23 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -850,7 +850,7 @@ ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
EXPORT_SYMBOL(vfs_iter_read);
static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
- loff_t *pos, rwf_t flags)
+ loff_t *pos, rwf_t flags)
{
size_t tot_len;
ssize_t ret = 0;
@@ -905,11 +905,18 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
EXPORT_SYMBOL(vfs_iocb_iter_write);
ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
- rwf_t flags)
+ rwf_t flags)
{
+ int ret;
+
if (!file->f_op->write_iter)
return -EINVAL;
- return do_iter_write(file, iter, ppos, flags);
+
+ file_start_write(file);
+ ret = do_iter_write(file, iter, ppos, flags);
+ file_end_write(file);
+
+ return ret;
}
EXPORT_SYMBOL(vfs_iter_write);
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 11/16] fs: move permission hook out of do_iter_write()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (9 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 10/16] fs: move file_start_write() into vfs_iter_write() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-22 14:33 ` Christian Brauner
2023-11-23 17:08 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 12/16] fs: move permission hook out of do_iter_read() Amir Goldstein
` (6 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
In many of the vfs helpers, the rw_verity_area() checks are called before
taking sb_start_write(), making them "start-write-safe".
do_iter_write() is an exception to this rule.
do_iter_write() has two callers - vfs_iter_write() and vfs_writev().
Move rw_verify_area() and other checks from do_iter_write() out to
its callers to make them "start-write-safe".
Move also the fsnotify_modify() hook to align with similar pattern
used in vfs_write() and other vfs helpers.
This is needed for fanotify "pre content" events.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/read_write.c | 78 +++++++++++++++++++++++++++++--------------------
1 file changed, 47 insertions(+), 31 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 87ca50f16a23..6c40468efe19 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -852,28 +852,10 @@ EXPORT_SYMBOL(vfs_iter_read);
static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
loff_t *pos, rwf_t flags)
{
- size_t tot_len;
- ssize_t ret = 0;
-
- if (!(file->f_mode & FMODE_WRITE))
- return -EBADF;
- if (!(file->f_mode & FMODE_CAN_WRITE))
- return -EINVAL;
-
- tot_len = iov_iter_count(iter);
- if (!tot_len)
- return 0;
- ret = rw_verify_area(WRITE, file, pos, tot_len);
- if (ret < 0)
- return ret;
-
if (file->f_op->write_iter)
- ret = do_iter_readv_writev(file, iter, pos, WRITE, flags);
- else
- ret = do_loop_readv_writev(file, iter, pos, WRITE, flags);
- if (ret > 0)
- fsnotify_modify(file);
- return ret;
+ return do_iter_readv_writev(file, iter, pos, WRITE, flags);
+
+ return do_loop_readv_writev(file, iter, pos, WRITE, flags);
}
ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
@@ -907,13 +889,28 @@ EXPORT_SYMBOL(vfs_iocb_iter_write);
ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
rwf_t flags)
{
- int ret;
+ size_t tot_len;
+ ssize_t ret;
+ if (!(file->f_mode & FMODE_WRITE))
+ return -EBADF;
+ if (!(file->f_mode & FMODE_CAN_WRITE))
+ return -EINVAL;
if (!file->f_op->write_iter)
return -EINVAL;
+ tot_len = iov_iter_count(iter);
+ if (!tot_len)
+ return 0;
+
+ ret = rw_verify_area(WRITE, file, ppos, tot_len);
+ if (ret < 0)
+ return ret;
+
file_start_write(file);
ret = do_iter_write(file, iter, ppos, flags);
+ if (ret > 0)
+ fsnotify_modify(file);
file_end_write(file);
return ret;
@@ -938,20 +935,39 @@ static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
}
static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos, rwf_t flags)
+ unsigned long vlen, loff_t *pos, rwf_t flags)
{
struct iovec iovstack[UIO_FASTIOV];
struct iovec *iov = iovstack;
struct iov_iter iter;
- ssize_t ret;
+ size_t tot_len;
+ ssize_t ret = 0;
- ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
- if (ret >= 0) {
- file_start_write(file);
- ret = do_iter_write(file, &iter, pos, flags);
- file_end_write(file);
- kfree(iov);
- }
+ if (!(file->f_mode & FMODE_WRITE))
+ return -EBADF;
+ if (!(file->f_mode & FMODE_CAN_WRITE))
+ return -EINVAL;
+
+ ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov,
+ &iter);
+ if (ret < 0)
+ return ret;
+
+ tot_len = iov_iter_count(&iter);
+ if (!tot_len)
+ goto out;
+
+ ret = rw_verify_area(WRITE, file, pos, tot_len);
+ if (ret < 0)
+ goto out;
+
+ file_start_write(file);
+ ret = do_iter_write(file, &iter, pos, flags);
+ if (ret > 0)
+ fsnotify_modify(file);
+ file_end_write(file);
+out:
+ kfree(iov);
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 12/16] fs: move permission hook out of do_iter_read()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (10 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 11/16] fs: move permission hook out of do_iter_write() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 17:13 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 13/16] fs: move kiocb_start_write() into vfs_iocb_iter_write() Amir Goldstein
` (5 subsequent siblings)
17 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
We recently moved fsnotify hook, rw_verify_area() and other checks from
do_iter_write() out to its two callers.
for consistency, do the same thing for do_iter_read() - move the
rw_verify_area() checks and fsnotify hook to the callers vfs_iter_read()
and vfs_readv().
This aligns those vfs helpers with the pattern used in vfs_read() and
vfs_iocb_iter_read() and the vfs write helpers, where all the checks are
in the vfs helpers and the do_* or call_* helpers do the work.
This is needed for fanotify "pre content" events.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/read_write.c | 74 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 48 insertions(+), 26 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 6c40468efe19..9410c3e6a04e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -784,12 +784,27 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
return ret;
}
+/*
+ * Low-level helpers don't perform rw sanity checks.
+ * The caller is responsible for that.
+ */
static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
- loff_t *pos, rwf_t flags)
+ loff_t *pos, rwf_t flags)
+{
+ if (file->f_op->read_iter)
+ return do_iter_readv_writev(file, iter, pos, READ, flags);
+
+ return do_loop_readv_writev(file, iter, pos, READ, flags);
+}
+
+ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb,
+ struct iov_iter *iter)
{
size_t tot_len;
ssize_t ret = 0;
+ if (!file->f_op->read_iter)
+ return -EINVAL;
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
@@ -798,22 +813,20 @@ static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
tot_len = iov_iter_count(iter);
if (!tot_len)
goto out;
- ret = rw_verify_area(READ, file, pos, tot_len);
+ ret = rw_verify_area(READ, file, &iocb->ki_pos, tot_len);
if (ret < 0)
return ret;
- if (file->f_op->read_iter)
- ret = do_iter_readv_writev(file, iter, pos, READ, flags);
- else
- ret = do_loop_readv_writev(file, iter, pos, READ, flags);
+ ret = call_read_iter(file, iocb, iter);
out:
if (ret >= 0)
fsnotify_access(file);
return ret;
}
+EXPORT_SYMBOL(vfs_iocb_iter_read);
-ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb,
- struct iov_iter *iter)
+ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
+ rwf_t flags)
{
size_t tot_len;
ssize_t ret = 0;
@@ -828,25 +841,16 @@ ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb,
tot_len = iov_iter_count(iter);
if (!tot_len)
goto out;
- ret = rw_verify_area(READ, file, &iocb->ki_pos, tot_len);
+ ret = rw_verify_area(READ, file, ppos, tot_len);
if (ret < 0)
return ret;
- ret = call_read_iter(file, iocb, iter);
+ ret = do_iter_read(file, iter, ppos, flags);
out:
if (ret >= 0)
fsnotify_access(file);
return ret;
}
-EXPORT_SYMBOL(vfs_iocb_iter_read);
-
-ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
- rwf_t flags)
-{
- if (!file->f_op->read_iter)
- return -EINVAL;
- return do_iter_read(file, iter, ppos, flags);
-}
EXPORT_SYMBOL(vfs_iter_read);
static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
@@ -918,19 +922,37 @@ ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
EXPORT_SYMBOL(vfs_iter_write);
static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos, rwf_t flags)
+ unsigned long vlen, loff_t *pos, rwf_t flags)
{
struct iovec iovstack[UIO_FASTIOV];
struct iovec *iov = iovstack;
struct iov_iter iter;
- ssize_t ret;
+ size_t tot_len;
+ ssize_t ret = 0;
- ret = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
- if (ret >= 0) {
- ret = do_iter_read(file, &iter, pos, flags);
- kfree(iov);
- }
+ if (!(file->f_mode & FMODE_READ))
+ return -EBADF;
+ if (!(file->f_mode & FMODE_CAN_READ))
+ return -EINVAL;
+ ret = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov,
+ &iter);
+ if (ret < 0)
+ return ret;
+
+ tot_len = iov_iter_count(&iter);
+ if (!tot_len)
+ goto out;
+
+ ret = rw_verify_area(READ, file, pos, tot_len);
+ if (ret < 0)
+ goto out;
+
+ ret = do_iter_read(file, &iter, pos, flags);
+out:
+ if (ret >= 0)
+ fsnotify_access(file);
+ kfree(iov);
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 13/16] fs: move kiocb_start_write() into vfs_iocb_iter_write()
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (11 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 12/16] fs: move permission hook out of do_iter_read() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 17:30 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 14/16] fs: create __sb_write_started() helper Amir Goldstein
` (4 subsequent siblings)
17 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
In vfs code, sb_start_write() is usually called after the permission hook
in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule,
where kiocb_start_write() is called by its callers.
Move kiocb_start_write() from the callers into vfs_iocb_iter_write()
after the rw_verify_area() checks, to make them "start-write-safe".
The semantics of vfs_iocb_iter_write() is changed, so that the caller is
responsible for calling kiocb_end_write() on completion only if async
iocb was queued. The completion handlers of both callers were adapted
to this semantic change.
This is needed for fanotify "pre content" events.
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/cachefiles/io.c | 5 ++---
fs/overlayfs/file.c | 8 ++++----
fs/read_write.c | 7 +++++++
3 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 009d23cd435b..5857241c5918 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -259,7 +259,8 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
_enter("%ld", ret);
- kiocb_end_write(iocb);
+ if (ki->was_async)
+ kiocb_end_write(iocb);
if (ret < 0)
trace_cachefiles_io_error(object, inode, ret,
@@ -319,8 +320,6 @@ int __cachefiles_write(struct cachefiles_object *object,
ki->iocb.ki_complete = cachefiles_write_complete;
atomic_long_add(ki->b_writing, &cache->b_writing);
- kiocb_start_write(&ki->iocb);
-
get_file(ki->iocb.ki_filp);
cachefiles_grab_object(object, cachefiles_obj_get_ioreq);
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 690b173f34fc..4e46420c8fdd 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -295,10 +295,8 @@ static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
struct kiocb *iocb = &aio_req->iocb;
struct kiocb *orig_iocb = aio_req->orig_iocb;
- if (iocb->ki_flags & IOCB_WRITE) {
- kiocb_end_write(iocb);
+ if (iocb->ki_flags & IOCB_WRITE)
ovl_file_modified(orig_iocb->ki_filp);
- }
orig_iocb->ki_pos = iocb->ki_pos;
ovl_aio_put(aio_req);
@@ -310,6 +308,9 @@ static void ovl_aio_rw_complete(struct kiocb *iocb, long res)
struct ovl_aio_req, iocb);
struct kiocb *orig_iocb = aio_req->orig_iocb;
+ if (iocb->ki_flags & IOCB_WRITE)
+ kiocb_end_write(iocb);
+
ovl_aio_cleanup_handler(aio_req);
orig_iocb->ki_complete(orig_iocb, res);
}
@@ -456,7 +457,6 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
aio_req->iocb.ki_flags = ifl;
aio_req->iocb.ki_complete = ovl_aio_queue_completion;
refcount_set(&aio_req->ref, 2);
- kiocb_start_write(&aio_req->iocb);
ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter);
ovl_aio_put(aio_req);
if (ret != -EIOCBQUEUED)
diff --git a/fs/read_write.c b/fs/read_write.c
index 9410c3e6a04e..ed0ea1132cee 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -862,6 +862,10 @@ static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
return do_loop_readv_writev(file, iter, pos, WRITE, flags);
}
+/*
+ * Caller is responsible for calling kiocb_end_write() on completion
+ * if async iocb was queued.
+ */
ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
struct iov_iter *iter)
{
@@ -882,7 +886,10 @@ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
if (ret < 0)
return ret;
+ kiocb_start_write(iocb);
ret = call_write_iter(file, iocb, iter);
+ if (ret != -EIOCBQUEUED)
+ kiocb_end_write(iocb);
if (ret > 0)
fsnotify_modify(file);
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 14/16] fs: create __sb_write_started() helper
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (12 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 13/16] fs: move kiocb_start_write() into vfs_iocb_iter_write() Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 17:31 ` Jan Kara
2023-11-24 9:14 ` Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 15/16] fs: create file_write_started() helper Amir Goldstein
` (3 subsequent siblings)
17 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
Similar to sb_write_started() for use by other sb freeze levels.
Unlike the boolean sb_write_started(), this helper returns a tristate
to distiguish the cases of lockdep disabled or unknown lock state.
This is needed for fanotify "pre content" events.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
include/linux/fs.h | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..e8aa48797bf4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1645,9 +1645,22 @@ static inline bool __sb_start_write_trylock(struct super_block *sb, int level)
#define __sb_writers_release(sb, lev) \
percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
+/**
+ * __sb_write_started - check if sb freeze level is held
+ * @sb: the super we write to
+ *
+ * > 0 sb freeze level is held
+ * 0 sb freeze level is not held
+ * < 0 !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN
+ */
+static inline int __sb_write_started(const struct super_block *sb, int level)
+{
+ return lockdep_is_held_type(sb->s_writers.rw_sem + level - 1, 1);
+}
+
static inline bool sb_write_started(const struct super_block *sb)
{
- return lockdep_is_held_type(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1, 1);
+ return __sb_write_started(sb, SB_FREEZE_WRITE);
}
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 15/16] fs: create file_write_started() helper
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (13 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 14/16] fs: create __sb_write_started() helper Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers Amir Goldstein
` (2 subsequent siblings)
17 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
Convenience wrapper for sb_write_started(file_inode(inode)->i_sb)), which
has a single occurrence in the code right now.
Document the false negatives of those helpers, which makes them unusable
to assert that sb_start_write() is not held.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/read_write.c | 2 +-
include/linux/fs.h | 21 +++++++++++++++++++++
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index ed0ea1132cee..f9e0a5b1a817 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1450,7 +1450,7 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
size_t len, unsigned int flags)
{
- lockdep_assert(sb_write_started(file_inode(file_out)->i_sb));
+ lockdep_assert(file_write_started(file_out));
return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e8aa48797bf4..05780d993c7d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1658,11 +1658,32 @@ static inline int __sb_write_started(const struct super_block *sb, int level)
return lockdep_is_held_type(sb->s_writers.rw_sem + level - 1, 1);
}
+/**
+ * sb_write_started - check if SB_FREEZE_WRITE is held
+ * @sb: the super we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ */
static inline bool sb_write_started(const struct super_block *sb)
{
return __sb_write_started(sb, SB_FREEZE_WRITE);
}
+/**
+ * file_write_started - check if SB_FREEZE_WRITE is held
+ * @file: the file we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ * May be false positive with !S_ISREG, because file_start_write() has
+ * no effect on !S_ISREG.
+ */
+static inline bool file_write_started(const struct file *file)
+{
+ if (!S_ISREG(file_inode(file)->i_mode))
+ return true;
+ return sb_write_started(file_inode(file)->i_sb);
+}
+
/**
* sb_end_write - drop write access to a superblock
* @sb: the super we wrote to
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (14 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 15/16] fs: create file_write_started() helper Amir Goldstein
@ 2023-11-22 12:27 ` Amir Goldstein
2023-11-23 17:35 ` Jan Kara
2023-11-22 14:06 ` [PATCH v2 00/16] Tidy up file permission hooks Josef Bacik
2023-11-22 15:04 ` Christian Brauner
17 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 12:27 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
Create new helpers {sb,file}_write_not_started() that can be used
to assert that sb_start_write() is not held.
This is needed for fanotify "pre content" events.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
include/linux/fs.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 05780d993c7d..c085172f832f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1669,6 +1669,17 @@ static inline bool sb_write_started(const struct super_block *sb)
return __sb_write_started(sb, SB_FREEZE_WRITE);
}
+/**
+ * sb_write_not_started - check if SB_FREEZE_WRITE is not held
+ * @sb: the super we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ */
+static inline bool sb_write_not_started(const struct super_block *sb)
+{
+ return __sb_write_started(sb, SB_FREEZE_WRITE) <= 0;
+}
+
/**
* file_write_started - check if SB_FREEZE_WRITE is held
* @file: the file we write to
@@ -1684,6 +1695,21 @@ static inline bool file_write_started(const struct file *file)
return sb_write_started(file_inode(file)->i_sb);
}
+/**
+ * file_write_not_started - check if SB_FREEZE_WRITE is not held
+ * @file: the file we write to
+ *
+ * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
+ * May be false positive with !S_ISREG, because file_start_write() has
+ * no effect on !S_ISREG.
+ */
+static inline bool file_write_not_started(const struct file *file)
+{
+ if (!S_ISREG(file_inode(file)->i_mode))
+ return true;
+ return sb_write_not_started(file_inode(file)->i_sb);
+}
+
/**
* sb_end_write - drop write access to a superblock
* @sb: the super we wrote to
--
2.34.1
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v2 00/16] Tidy up file permission hooks
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (15 preceding siblings ...)
2023-11-22 12:27 ` [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers Amir Goldstein
@ 2023-11-22 14:06 ` Josef Bacik
2023-11-22 15:04 ` Christian Brauner
17 siblings, 0 replies; 55+ messages in thread
From: Josef Bacik @ 2023-11-22 14:06 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, David Howells, Jens Axboe,
Miklos Szeredi, Al Viro, linux-fsdevel
On Wed, Nov 22, 2023 at 02:26:59PM +0200, Amir Goldstein wrote:
> Hi Christian,
>
> During my work on fanotify "pre content" events [1], Jan and I noticed
> some inconsistencies in the call sites of security_file_permission()
> hooks inside rw_verify_area() and remap_verify_area().
>
> The majority of call sites are before file_start_write(), which is how
> we want them to be for fanotify "pre content" events.
>
> For splice code, there are many duplicate calls to rw_verify_area()
> for the entire range as well as for partial ranges inside iterator.
>
> This cleanup series, mostly following Jan's suggestions, moves all
> the security_file_permission() hooks before file_start_write() and
> eliminates duplicate permission hook calls in the same call chain.
>
> The last 3 patches are helpers that I used in fanotify patches to
> assert that permission hooks are called with expected locking scope.
>
> Please stage this work on a stable branch in the vfs tree, so that
> I will be able to send Jan fanotify patches for "pre content" events
> based on the stable vfs branch.
>
You can add
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
To the rest of the patches that don't already have my reviewed-by. Thanks,
Josef
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 11/16] fs: move permission hook out of do_iter_write()
2023-11-22 12:27 ` [PATCH v2 11/16] fs: move permission hook out of do_iter_write() Amir Goldstein
@ 2023-11-22 14:33 ` Christian Brauner
2023-11-22 15:25 ` Amir Goldstein
2023-11-23 17:08 ` Jan Kara
1 sibling, 1 reply; 55+ messages in thread
From: Christian Brauner @ 2023-11-22 14:33 UTC (permalink / raw)
To: Amir Goldstein
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
> - ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
> - if (ret >= 0) {
> - file_start_write(file);
> - ret = do_iter_write(file, &iter, pos, flags);
> - file_end_write(file);
> - kfree(iov);
> - }
> + if (!(file->f_mode & FMODE_WRITE))
> + return -EBADF;
> + if (!(file->f_mode & FMODE_CAN_WRITE))
> + return -EINVAL;
> +
> + ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov,
> + &iter);
> + if (ret < 0)
> + return ret;
> +
> + tot_len = iov_iter_count(&iter);
> + if (!tot_len)
> + goto out;
Fwiw, the logic is slightly changed here. This now relies on
import_iovec() >= 0 then iov_iter_count() >= 0.
If that's ever changed and iov_iter_count() can return an error even
though import_iovec() succeeded we'll be returning the number of
imported bytes even though nothing was written and also generate a
fsnotify event because ret still points to the number of imported bytes.
The way it was written before it didn't matter because this was hidden
in a function call that returned 0 and initialized ret again. Anyway, I
can just massage that in-tree if that's worth it. Nothing to do for you.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 00/16] Tidy up file permission hooks
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
` (16 preceding siblings ...)
2023-11-22 14:06 ` [PATCH v2 00/16] Tidy up file permission hooks Josef Bacik
@ 2023-11-22 15:04 ` Christian Brauner
17 siblings, 0 replies; 55+ messages in thread
From: Christian Brauner @ 2023-11-22 15:04 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed, 22 Nov 2023 14:26:59 +0200, Amir Goldstein wrote:
> During my work on fanotify "pre content" events [1], Jan and I noticed
> some inconsistencies in the call sites of security_file_permission()
> hooks inside rw_verify_area() and remap_verify_area().
>
> The majority of call sites are before file_start_write(), which is how
> we want them to be for fanotify "pre content" events.
>
> [...]
Applied to the vfs.rw branch of the vfs/vfs.git tree.
Patches in the vfs.rw branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.rw
[01/16] ovl: add permission hooks outside of do_splice_direct()
https://git.kernel.org/vfs/vfs/c/52009acbab21
[02/16] splice: remove permission hook from do_splice_direct()
https://git.kernel.org/vfs/vfs/c/095b2d7c4b93
[03/16] splice: move permission hook out of splice_direct_to_actor()
https://git.kernel.org/vfs/vfs/c/ead0aac245dd
[04/16] splice: move permission hook out of splice_file_to_pipe()
https://git.kernel.org/vfs/vfs/c/74a66b648378
[05/16] splice: remove permission hook from iter_file_splice_write()
https://git.kernel.org/vfs/vfs/c/e341582fdb7c
[06/16] remap_range: move permission hooks out of do_clone_file_range()
https://git.kernel.org/vfs/vfs/c/6ddfddac4922
[07/16] remap_range: move file_start_write() to after permission hook
https://git.kernel.org/vfs/vfs/c/a1fffabd504c
[08/16] btrfs: move file_start_write() to after permission hook
https://git.kernel.org/vfs/vfs/c/ca267f48c8e9
[09/16] coda: change locking order in coda_file_write_iter()
https://git.kernel.org/vfs/vfs/c/aee32ff62e8b
[10/16] fs: move file_start_write() into vfs_iter_write()
https://git.kernel.org/vfs/vfs/c/f02abb810579
[11/16] fs: move permission hook out of do_iter_write()
https://git.kernel.org/vfs/vfs/c/11dc9bc73318
[12/16] fs: move permission hook out of do_iter_read()
https://git.kernel.org/vfs/vfs/c/c8b86e93b6e2
[13/16] fs: move kiocb_start_write() into vfs_iocb_iter_write()
https://git.kernel.org/vfs/vfs/c/a4e6c478189e
[14/16] fs: create __sb_write_started() helper
https://git.kernel.org/vfs/vfs/c/2a7b49f698d0
[15/16] fs: create file_write_started() helper
https://git.kernel.org/vfs/vfs/c/0d3b7690bd1f
[16/16] fs: create {sb,file}_write_not_started() helpers
https://git.kernel.org/vfs/vfs/c/c88b5e392b2e
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 11/16] fs: move permission hook out of do_iter_write()
2023-11-22 14:33 ` Christian Brauner
@ 2023-11-22 15:25 ` Amir Goldstein
0 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-22 15:25 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
On Wed, Nov 22, 2023 at 4:34 PM Christian Brauner <brauner@kernel.org> wrote:
>
> > - ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
> > - if (ret >= 0) {
> > - file_start_write(file);
> > - ret = do_iter_write(file, &iter, pos, flags);
> > - file_end_write(file);
> > - kfree(iov);
> > - }
> > + if (!(file->f_mode & FMODE_WRITE))
> > + return -EBADF;
> > + if (!(file->f_mode & FMODE_CAN_WRITE))
> > + return -EINVAL;
> > +
> > + ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov,
> > + &iter);
> > + if (ret < 0)
> > + return ret;
> > +
> > + tot_len = iov_iter_count(&iter);
> > + if (!tot_len)
> > + goto out;
>
> Fwiw, the logic is slightly changed here. This now relies on
> import_iovec() >= 0 then iov_iter_count() >= 0.
>
> If that's ever changed and iov_iter_count() can return an error even
> though import_iovec() succeeded we'll be returning the number of
> imported bytes even though nothing was written and also generate a
> fsnotify event because ret still points to the number of imported bytes.
>
> The way it was written before it didn't matter because this was hidden
> in a function call that returned 0 and initialized ret again.
Good catch!
> Anyway, I
> can just massage that in-tree if that's worth it. Nothing to do for you.
Thank you!
Amir.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct()
2023-11-22 12:27 ` [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct() Amir Goldstein
@ 2023-11-23 7:35 ` Christoph Hellwig
2023-11-23 16:28 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:35 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 02/16] splice: remove permission hook from do_splice_direct()
2023-11-22 12:27 ` [PATCH v2 02/16] splice: remove permission hook from do_splice_direct() Amir Goldstein
@ 2023-11-23 7:36 ` Christoph Hellwig
2023-11-23 16:28 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:36 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor()
2023-11-22 12:27 ` [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor() Amir Goldstein
@ 2023-11-23 7:36 ` Christoph Hellwig
2023-11-23 16:35 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:36 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel, Chuck Lever
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe()
2023-11-22 12:27 ` [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe() Amir Goldstein
@ 2023-11-23 7:38 ` Christoph Hellwig
2023-11-23 16:37 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:38 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-22 12:27 ` [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write() Amir Goldstein
@ 2023-11-23 7:47 ` Christoph Hellwig
2023-11-23 11:20 ` Amir Goldstein
0 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:47 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
> +ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from,
> + loff_t *pos);
> +ssize_t do_iter_writev(struct file *file, struct iov_iter *iter, loff_t *ppos,
> + rwf_t flags);
So I first stumbled on the name because I kinda hate do_* functions,
especially if they are not static, and then went down a little rathole:
- first obviously the name, based on the other functions it probably
should be in the __kernel_* namespace unless I'm missing something.
- second can you add a little comment when it is to be used? Our
maze of read and write helpers is becomeing a little too confusing
even for someone who thought he knew the code (and wrote some if it).
- can we just split do_iter_readv_writev instead of adding a wrapper
Yes, that'll duplicate a little boiler plate code, but it'll make
things much easier to follow.
(- same probably for do_loop_readv_writev, although not directly
relevant to this series)
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range()
2023-11-22 12:27 ` [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range() Amir Goldstein
@ 2023-11-23 7:47 ` Christoph Hellwig
2023-11-23 16:52 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:47 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook
2023-11-22 12:27 ` [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook Amir Goldstein
@ 2023-11-23 7:48 ` Christoph Hellwig
2023-11-23 16:53 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:48 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 08/16] btrfs: move file_start_write() to after permission hook
2023-11-22 12:27 ` [PATCH v2 08/16] btrfs: " Amir Goldstein
@ 2023-11-23 7:48 ` Christoph Hellwig
2023-11-23 16:54 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:48 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 10/16] fs: move file_start_write() into vfs_iter_write()
2023-11-22 12:27 ` [PATCH v2 10/16] fs: move file_start_write() into vfs_iter_write() Amir Goldstein
@ 2023-11-23 7:50 ` Christoph Hellwig
2023-11-23 8:04 ` Amir Goldstein
0 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 7:50 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed, Nov 22, 2023 at 02:27:09PM +0200, Amir Goldstein wrote:
> All the callers of vfs_iter_write() call file_start_write() just before
> calling vfs_iter_write() except for target_core_file's fd_do_rw().
Can you add a patch to first add it to fd_do_rw? While this crates a
bit of churn, it has two benefits:
- it gives us an easily backportable fix
- it makes this patch a transformation that doesn't change behavior.
Please also cc the scsi and target lists. It probably makes sense to
even expedite it and send it for 6.7-rc inclusion separately.
be a 6.7 candidate
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 10/16] fs: move file_start_write() into vfs_iter_write()
2023-11-23 7:50 ` Christoph Hellwig
@ 2023-11-23 8:04 ` Amir Goldstein
0 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-23 8:04 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Thu, Nov 23, 2023 at 9:50 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Nov 22, 2023 at 02:27:09PM +0200, Amir Goldstein wrote:
> > All the callers of vfs_iter_write() call file_start_write() just before
> > calling vfs_iter_write() except for target_core_file's fd_do_rw().
>
> Can you add a patch to first add it to fd_do_rw? While this crates a
> bit of churn, it has two benefits:
>
> - it gives us an easily backportable fix
> - it makes this patch a transformation that doesn't change behavior.
>
> Please also cc the scsi and target lists. It probably makes sense to
> even expedite it and send it for 6.7-rc inclusion separately.
> be a 6.7 candidate
Good idea. will do!
Thanks for the review!
Amir.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-23 7:47 ` Christoph Hellwig
@ 2023-11-23 11:20 ` Amir Goldstein
2023-11-23 15:14 ` Christoph Hellwig
0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-23 11:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Thu, Nov 23, 2023 at 9:47 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> > +ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from,
> > + loff_t *pos);
> > +ssize_t do_iter_writev(struct file *file, struct iov_iter *iter, loff_t *ppos,
> > + rwf_t flags);
>
> So I first stumbled on the name because I kinda hate do_* functions,
> especially if they are not static, and then went down a little rathole:
>
>
> - first obviously the name, based on the other functions it probably
> should be in the __kernel_* namespace unless I'm missing something.
Not sure about the best name.
I just derived the name from do_iter_readv_writev(), which would be the
name of this helper if we split up do_iter_readv_writev() as you suggested.
> - second can you add a little comment when it is to be used? Our
> maze of read and write helpers is becoming a little too confusing
> even for someone who thought he knew the code (and wrote some if it).
I could try, but I can't say that I know how to document the maze.
I was trying to make small changes that make the maze a bit
more consistent - after my series, in most cases we have:
vfs_XXX()
rw_verify_area()
file_start_write()
do_XXX()
But obviously, there is still a wide variety of do_* helpers
with different conventions.
> - can we just split do_iter_readv_writev instead of adding a wrapper
> Yes, that'll duplicate a little boiler plate code, but it'll make
> things much easier to follow.
> (- same probably for do_loop_readv_writev, although not directly
> relevant to this series)
>
We can do that, it's easy, but I also got opposite comments from
Christian about opportunities to factor out more common code, so
I will do it if there is consensus.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-23 11:20 ` Amir Goldstein
@ 2023-11-23 15:14 ` Christoph Hellwig
2023-11-23 15:41 ` Amir Goldstein
0 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 15:14 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Josef Bacik,
David Howells, Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Thu, Nov 23, 2023 at 01:20:13PM +0200, Amir Goldstein wrote:
> > - first obviously the name, based on the other functions it probably
> > should be in the __kernel_* namespace unless I'm missing something.
>
> Not sure about the best name.
> I just derived the name from do_iter_readv_writev(), which would be the
> name of this helper if we split up do_iter_readv_writev() as you suggested.
Well, I don't think do_iter_readv_writev is a particular good name
even now, but certainly not once it is more than just a static helper
with two callers.
I can't say __kernel is an all that great name either, but it seems
to match the existing ones.
That being said - it just is a tiny wrapper anyway, what about just
open coding it instead of bike shedding? See below for a patch,
this actually seems pretty reasonable and very readable.
---
diff --git a/fs/splice.c b/fs/splice.c
index d983d375ff1130..982a0872fa03e9 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -684,6 +684,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
splice_from_pipe_begin(&sd);
while (sd.total_len) {
+ struct kiocb kiocb;
struct iov_iter from;
unsigned int head, tail, mask;
size_t left;
@@ -733,7 +734,10 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
}
iov_iter_bvec(&from, ITER_SOURCE, array, n, sd.total_len - left);
- ret = vfs_iter_write(out, &from, &sd.pos, 0);
+ init_sync_kiocb(&kiocb, out);
+ kiocb.ki_pos = sd.pos;
+ ret = out->f_op->write_iter(&kiocb, &from);
+ sd.pos = kiocb.ki_pos;
if (ret <= 0)
break;
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-23 15:14 ` Christoph Hellwig
@ 2023-11-23 15:41 ` Amir Goldstein
2023-11-23 15:45 ` Christoph Hellwig
2023-11-23 16:22 ` Christian Brauner
0 siblings, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-23 15:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Thu, Nov 23, 2023 at 5:14 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Nov 23, 2023 at 01:20:13PM +0200, Amir Goldstein wrote:
> > > - first obviously the name, based on the other functions it probably
> > > should be in the __kernel_* namespace unless I'm missing something.
> >
> > Not sure about the best name.
> > I just derived the name from do_iter_readv_writev(), which would be the
> > name of this helper if we split up do_iter_readv_writev() as you suggested.
>
> Well, I don't think do_iter_readv_writev is a particular good name
> even now, but certainly not once it is more than just a static helper
> with two callers.
>
> I can't say __kernel is an all that great name either, but it seems
> to match the existing ones.
>
> That being said - it just is a tiny wrapper anyway, what about just
> open coding it instead of bike shedding? See below for a patch,
> this actually seems pretty reasonable and very readable.
>
Heh! avoiding bike shedding is a worthy cause :)
It works for me.
I will let Christian decide if he prefers this over the existing
small helper with meaningless name.
> ---
> diff --git a/fs/splice.c b/fs/splice.c
> index d983d375ff1130..982a0872fa03e9 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -684,6 +684,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
>
> splice_from_pipe_begin(&sd);
> while (sd.total_len) {
> + struct kiocb kiocb;
> struct iov_iter from;
> unsigned int head, tail, mask;
> size_t left;
> @@ -733,7 +734,10 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> }
>
> iov_iter_bvec(&from, ITER_SOURCE, array, n, sd.total_len - left);
> - ret = vfs_iter_write(out, &from, &sd.pos, 0);
> + init_sync_kiocb(&kiocb, out);
> + kiocb.ki_pos = sd.pos;
> + ret = out->f_op->write_iter(&kiocb, &from);
> + sd.pos = kiocb.ki_pos;
> if (ret <= 0)
> break;
>
Are we open coding call_write_iter() now?
Is that a trend that I am not aware of?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-23 15:41 ` Amir Goldstein
@ 2023-11-23 15:45 ` Christoph Hellwig
2023-11-23 16:22 ` Christian Brauner
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2023-11-23 15:45 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christoph Hellwig, Christian Brauner, Jan Kara, Josef Bacik,
David Howells, Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
> Are we open coding call_write_iter() now?
> Is that a trend that I am not aware of?
I thought we finally agreed on killing the damn thing. I was about to
get ready to send a series to kill after wading through the read/write
helpers again..
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-23 15:41 ` Amir Goldstein
2023-11-23 15:45 ` Christoph Hellwig
@ 2023-11-23 16:22 ` Christian Brauner
2023-11-23 16:53 ` Amir Goldstein
1 sibling, 1 reply; 55+ messages in thread
From: Christian Brauner @ 2023-11-23 16:22 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christoph Hellwig, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
> > diff --git a/fs/splice.c b/fs/splice.c
> > index d983d375ff1130..982a0872fa03e9 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -684,6 +684,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> >
> > splice_from_pipe_begin(&sd);
> > while (sd.total_len) {
> > + struct kiocb kiocb;
> > struct iov_iter from;
> > unsigned int head, tail, mask;
> > size_t left;
> > @@ -733,7 +734,10 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> > }
> >
> > iov_iter_bvec(&from, ITER_SOURCE, array, n, sd.total_len - left);
> > - ret = vfs_iter_write(out, &from, &sd.pos, 0);
> > + init_sync_kiocb(&kiocb, out);
> > + kiocb.ki_pos = sd.pos;
> > + ret = out->f_op->write_iter(&kiocb, &from);
> > + sd.pos = kiocb.ki_pos;
> > if (ret <= 0)
> > break;
> >
>
> Are we open coding call_write_iter() now?
> Is that a trend that I am not aware of?
I'll fold that in as-is but I'll use call_write_iter() for now.
We can remove that later. For now consistency matters more.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct()
2023-11-22 12:27 ` [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct() Amir Goldstein
2023-11-23 7:35 ` Christoph Hellwig
@ 2023-11-23 16:28 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 16:28 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:00, Amir Goldstein wrote:
> The main callers of do_splice_direct() also call rw_verify_area() for
> the entire range that is being copied, e.g. by vfs_copy_file_range()
> or do_sendfile() before calling do_splice_direct().
>
> The only caller that does not have those checks for entire range is
> ovl_copy_up_file(). In preparation for removing the checks inside
> do_splice_direct(), add rw_verify_area() call in ovl_copy_up_file().
>
> For extra safety, perform minimal sanity checks from rw_verify_area()
> for non negative offsets also in the copy up do_splice_direct() loop
> without calling the file permission hooks.
>
> This is needed for fanotify "pre content" events.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/overlayfs/copy_up.c | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 4382881b0709..106f8643af3b 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -230,6 +230,19 @@ static int ovl_copy_fileattr(struct inode *inode, const struct path *old,
> return ovl_real_fileattr_set(new, &newfa);
> }
>
> +static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
> +{
> + loff_t tmp;
> +
> + if (WARN_ON_ONCE(pos != pos2))
> + return -EIO;
> + if (WARN_ON_ONCE(pos < 0 || len < 0 || totlen < 0))
> + return -EIO;
> + if (WARN_ON_ONCE(check_add_overflow(pos, len, &tmp)))
> + return -EIO;
> + return 0;
> +}
> +
> static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
> struct file *new_file, loff_t len)
> {
> @@ -244,13 +257,20 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
> int error = 0;
>
> ovl_path_lowerdata(dentry, &datapath);
> - if (WARN_ON(datapath.dentry == NULL))
> + if (WARN_ON_ONCE(datapath.dentry == NULL) ||
> + WARN_ON_ONCE(len < 0))
> return -EIO;
>
> old_file = ovl_path_open(&datapath, O_LARGEFILE | O_RDONLY);
> if (IS_ERR(old_file))
> return PTR_ERR(old_file);
>
> + error = rw_verify_area(READ, old_file, &old_pos, len);
> + if (!error)
> + error = rw_verify_area(WRITE, new_file, &new_pos, len);
> + if (error)
> + goto out_fput;
> +
> /* Try to use clone_file_range to clone up within the same fs */
> ovl_start_write(dentry);
> cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0);
> @@ -309,6 +329,10 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
> }
> }
>
> + error = ovl_verify_area(old_pos, new_pos, this_len, len);
> + if (error)
> + break;
> +
> ovl_start_write(dentry);
> bytes = do_splice_direct(old_file, &old_pos,
> new_file, &new_pos,
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 02/16] splice: remove permission hook from do_splice_direct()
2023-11-22 12:27 ` [PATCH v2 02/16] splice: remove permission hook from do_splice_direct() Amir Goldstein
2023-11-23 7:36 ` Christoph Hellwig
@ 2023-11-23 16:28 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 16:28 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:01, Amir Goldstein wrote:
> All callers of do_splice_direct() have a call to rw_verify_area() for
> the entire range that is being copied, e.g. by vfs_copy_file_range() or
> do_sendfile() before calling do_splice_direct().
>
> The rw_verify_area() check inside do_splice_direct() is redundant and
> is called after sb_start_write(), so it is not "start-write-safe".
> Remove this redundant check.
>
> This is needed for fanotify "pre content" events.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/splice.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/fs/splice.c b/fs/splice.c
> index d983d375ff11..6e917db6f49a 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1166,6 +1166,7 @@ static void direct_file_splice_eof(struct splice_desc *sd)
> * (splice in + splice out, as compared to just sendfile()). So this helper
> * can splice directly through a process-private pipe.
> *
> + * Callers already called rw_verify_area() on the entire range.
> */
> long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
> loff_t *opos, size_t len, unsigned int flags)
> @@ -1187,10 +1188,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
> if (unlikely(out->f_flags & O_APPEND))
> return -EINVAL;
>
> - ret = rw_verify_area(WRITE, out, opos, len);
> - if (unlikely(ret < 0))
> - return ret;
> -
> ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
> if (ret > 0)
> *ppos = sd.pos;
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor()
2023-11-22 12:27 ` [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor() Amir Goldstein
2023-11-23 7:36 ` Christoph Hellwig
@ 2023-11-23 16:35 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 16:35 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel, Chuck Lever
On Wed 22-11-23 14:27:02, Amir Goldstein wrote:
> vfs_splice_read() has a permission hook inside rw_verify_area() and
> it is called from do_splice_direct() -> splice_direct_to_actor().
>
> The callers of do_splice_direct() (e.g. vfs_copy_file_range()) already
> call rw_verify_area() for the entire range, but the other caller of
> splice_direct_to_actor() (nfsd) does not.
>
> Add the rw_verify_area() checks in nfsd_splice_read() and use a
> variant of vfs_splice_read() without rw_verify_area() check in
> splice_direct_to_actor() to avoid the redundant rw_verify_area() checks.
>
> This is needed for fanotify "pre content" events.
>
> Acked-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/nfsd/vfs.c | 5 ++++-
> fs/splice.c | 58 +++++++++++++++++++++++++++++++--------------------
> 2 files changed, 39 insertions(+), 24 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index fbbea7498f02..5d704461e3b4 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1046,7 +1046,10 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> ssize_t host_err;
>
> trace_nfsd_read_splice(rqstp, fhp, offset, *count);
> - host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
> + host_err = rw_verify_area(READ, file, &offset, *count);
> + if (!host_err)
> + host_err = splice_direct_to_actor(file, &sd,
> + nfsd_direct_splice_actor);
> return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
> }
>
> diff --git a/fs/splice.c b/fs/splice.c
> index 6e917db6f49a..6fc2c27e9520 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -944,27 +944,15 @@ static void do_splice_eof(struct splice_desc *sd)
> sd->splice_eof(sd);
> }
>
> -/**
> - * vfs_splice_read - Read data from a file and splice it into a pipe
> - * @in: File to splice from
> - * @ppos: Input file offset
> - * @pipe: Pipe to splice to
> - * @len: Number of bytes to splice
> - * @flags: Splice modifier flags (SPLICE_F_*)
> - *
> - * Splice the requested amount of data from the input file to the pipe. This
> - * is synchronous as the caller must hold the pipe lock across the entire
> - * operation.
> - *
> - * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
> - * a hole and a negative error code otherwise.
> +/*
> + * Callers already called rw_verify_area() on the entire range.
> + * No need to call it for sub ranges.
> */
> -long vfs_splice_read(struct file *in, loff_t *ppos,
> - struct pipe_inode_info *pipe, size_t len,
> - unsigned int flags)
> +static long do_splice_read(struct file *in, loff_t *ppos,
> + struct pipe_inode_info *pipe, size_t len,
> + unsigned int flags)
> {
> unsigned int p_space;
> - int ret;
>
> if (unlikely(!(in->f_mode & FMODE_READ)))
> return -EBADF;
> @@ -975,10 +963,6 @@ long vfs_splice_read(struct file *in, loff_t *ppos,
> p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail);
> len = min_t(size_t, len, p_space << PAGE_SHIFT);
>
> - ret = rw_verify_area(READ, in, ppos, len);
> - if (unlikely(ret < 0))
> - return ret;
> -
> if (unlikely(len > MAX_RW_COUNT))
> len = MAX_RW_COUNT;
>
> @@ -992,6 +976,34 @@ long vfs_splice_read(struct file *in, loff_t *ppos,
> return copy_splice_read(in, ppos, pipe, len, flags);
> return in->f_op->splice_read(in, ppos, pipe, len, flags);
> }
> +
> +/**
> + * vfs_splice_read - Read data from a file and splice it into a pipe
> + * @in: File to splice from
> + * @ppos: Input file offset
> + * @pipe: Pipe to splice to
> + * @len: Number of bytes to splice
> + * @flags: Splice modifier flags (SPLICE_F_*)
> + *
> + * Splice the requested amount of data from the input file to the pipe. This
> + * is synchronous as the caller must hold the pipe lock across the entire
> + * operation.
> + *
> + * If successful, it returns the amount of data spliced, 0 if it hit the EOF or
> + * a hole and a negative error code otherwise.
> + */
> +long vfs_splice_read(struct file *in, loff_t *ppos,
> + struct pipe_inode_info *pipe, size_t len,
> + unsigned int flags)
> +{
> + int ret;
> +
> + ret = rw_verify_area(READ, in, ppos, len);
> + if (unlikely(ret < 0))
> + return ret;
> +
> + return do_splice_read(in, ppos, pipe, len, flags);
> +}
> EXPORT_SYMBOL_GPL(vfs_splice_read);
>
> /**
> @@ -1066,7 +1078,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
> size_t read_len;
> loff_t pos = sd->pos, prev_pos = pos;
>
> - ret = vfs_splice_read(in, &pos, pipe, len, flags);
> + ret = do_splice_read(in, &pos, pipe, len, flags);
> if (unlikely(ret <= 0))
> goto read_failure;
>
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe()
2023-11-22 12:27 ` [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe() Amir Goldstein
2023-11-23 7:38 ` Christoph Hellwig
@ 2023-11-23 16:37 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 16:37 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:03, Amir Goldstein wrote:
> vfs_splice_read() has a permission hook inside rw_verify_area() and
> it is called from splice_file_to_pipe(), which is called from
> do_splice() and do_sendfile().
>
> do_sendfile() already has a rw_verify_area() check for the entire range.
> do_splice() has a rw_verify_check() for the splice to file case, not for
> the splice from file case.
>
> Add the rw_verify_area() check for splice from file case in do_splice()
> and use a variant of vfs_splice_read() without rw_verify_area() check
> in splice_file_to_pipe() to avoid the redundant rw_verify_area() checks.
>
> This is needed for fanotify "pre content" events.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/splice.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/splice.c b/fs/splice.c
> index 6fc2c27e9520..d4fdd44c0b32 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -1239,7 +1239,7 @@ long splice_file_to_pipe(struct file *in,
> pipe_lock(opipe);
> ret = wait_for_space(opipe, flags);
> if (!ret)
> - ret = vfs_splice_read(in, offset, opipe, len, flags);
> + ret = do_splice_read(in, offset, opipe, len, flags);
> pipe_unlock(opipe);
> if (ret > 0)
> wakeup_pipe_readers(opipe);
> @@ -1316,6 +1316,10 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out,
> offset = in->f_pos;
> }
>
> + ret = rw_verify_area(READ, in, &offset, len);
> + if (unlikely(ret < 0))
> + return ret;
> +
> if (out->f_flags & O_NONBLOCK)
> flags |= SPLICE_F_NONBLOCK;
>
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range()
2023-11-22 12:27 ` [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range() Amir Goldstein
2023-11-23 7:47 ` Christoph Hellwig
@ 2023-11-23 16:52 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 16:52 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:05, Amir Goldstein wrote:
> In many of the vfs helpers, file permission hook is called before
> taking sb_start_write(), making them "start-write-safe".
> do_clone_file_range() is an exception to this rule.
>
> do_clone_file_range() has two callers - vfs_clone_file_range() and
> overlayfs. Move remap_verify_area() checks from do_clone_file_range()
> out to vfs_clone_file_range() to make them "start-write-safe".
>
> Overlayfs already has calls to rw_verify_area() with the same security
> permission hooks as remap_verify_area() has.
> The rest of the checks in remap_verify_area() are irrelevant for
> overlayfs that calls do_clone_file_range() offset 0 and positive length.
>
> This is needed for fanotify "pre content" events.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/remap_range.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/remap_range.c b/fs/remap_range.c
> index 87ae4f0dc3aa..42f79cb2b1b1 100644
> --- a/fs/remap_range.c
> +++ b/fs/remap_range.c
> @@ -385,14 +385,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
> if (!file_in->f_op->remap_file_range)
> return -EOPNOTSUPP;
>
> - ret = remap_verify_area(file_in, pos_in, len, false);
> - if (ret)
> - return ret;
> -
> - ret = remap_verify_area(file_out, pos_out, len, true);
> - if (ret)
> - return ret;
> -
> ret = file_in->f_op->remap_file_range(file_in, pos_in,
> file_out, pos_out, len, remap_flags);
> if (ret < 0)
> @@ -410,6 +402,14 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> {
> loff_t ret;
>
> + ret = remap_verify_area(file_in, pos_in, len, false);
> + if (ret)
> + return ret;
> +
> + ret = remap_verify_area(file_out, pos_out, len, true);
> + if (ret)
> + return ret;
> +
> file_start_write(file_out);
> ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
> remap_flags);
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook
2023-11-22 12:27 ` [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook Amir Goldstein
2023-11-23 7:48 ` Christoph Hellwig
@ 2023-11-23 16:53 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 16:53 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:06, Amir Goldstein wrote:
> In vfs code, file_start_write() is usually called after the permission
> hook in rw_verify_area(). vfs_dedupe_file_range_one() is an exception
> to this rule.
>
> In vfs_dedupe_file_range_one(), move file_start_write() to after the
> the rw_verify_area() checks to make them "start-write-safe".
>
> This is needed for fanotify "pre content" events.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/remap_range.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/fs/remap_range.c b/fs/remap_range.c
> index 42f79cb2b1b1..12131f2a6c9e 100644
> --- a/fs/remap_range.c
> +++ b/fs/remap_range.c
> @@ -420,7 +420,7 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
> EXPORT_SYMBOL(vfs_clone_file_range);
>
> /* Check whether we are allowed to dedupe the destination file */
> -static bool allow_file_dedupe(struct file *file)
> +static bool may_dedupe_file(struct file *file)
> {
> struct mnt_idmap *idmap = file_mnt_idmap(file);
> struct inode *inode = file_inode(file);
> @@ -445,24 +445,29 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
> WARN_ON_ONCE(remap_flags & ~(REMAP_FILE_DEDUP |
> REMAP_FILE_CAN_SHORTEN));
>
> - ret = mnt_want_write_file(dst_file);
> - if (ret)
> - return ret;
> -
> /*
> * This is redundant if called from vfs_dedupe_file_range(), but other
> * callers need it and it's not performance sesitive...
> */
> ret = remap_verify_area(src_file, src_pos, len, false);
> if (ret)
> - goto out_drop_write;
> + return ret;
>
> ret = remap_verify_area(dst_file, dst_pos, len, true);
> if (ret)
> - goto out_drop_write;
> + return ret;
> +
> + /*
> + * This needs to be called after remap_verify_area() because of
> + * sb_start_write() and before may_dedupe_file() because the mount's
> + * MAY_WRITE need to be checked with mnt_get_write_access_file() held.
> + */
> + ret = mnt_want_write_file(dst_file);
> + if (ret)
> + return ret;
>
> ret = -EPERM;
> - if (!allow_file_dedupe(dst_file))
> + if (!may_dedupe_file(dst_file))
> goto out_drop_write;
>
> ret = -EXDEV;
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-23 16:22 ` Christian Brauner
@ 2023-11-23 16:53 ` Amir Goldstein
2023-11-23 17:56 ` Christian Brauner
0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-23 16:53 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Thu, Nov 23, 2023 at 6:22 PM Christian Brauner <brauner@kernel.org> wrote:
>
> > > diff --git a/fs/splice.c b/fs/splice.c
> > > index d983d375ff1130..982a0872fa03e9 100644
> > > --- a/fs/splice.c
> > > +++ b/fs/splice.c
> > > @@ -684,6 +684,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> > >
> > > splice_from_pipe_begin(&sd);
> > > while (sd.total_len) {
> > > + struct kiocb kiocb;
> > > struct iov_iter from;
> > > unsigned int head, tail, mask;
> > > size_t left;
> > > @@ -733,7 +734,10 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> > > }
> > >
> > > iov_iter_bvec(&from, ITER_SOURCE, array, n, sd.total_len - left);
> > > - ret = vfs_iter_write(out, &from, &sd.pos, 0);
> > > + init_sync_kiocb(&kiocb, out);
> > > + kiocb.ki_pos = sd.pos;
> > > + ret = out->f_op->write_iter(&kiocb, &from);
> > > + sd.pos = kiocb.ki_pos;
> > > if (ret <= 0)
> > > break;
> > >
> >
> > Are we open coding call_write_iter() now?
> > Is that a trend that I am not aware of?
>
> I'll fold that in as-is but I'll use call_write_iter() for now.
> We can remove that later. For now consistency matters more.
Stating the obvious - please don't forget to edit the commit message
removing mention of the helper.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 08/16] btrfs: move file_start_write() to after permission hook
2023-11-22 12:27 ` [PATCH v2 08/16] btrfs: " Amir Goldstein
2023-11-23 7:48 ` Christoph Hellwig
@ 2023-11-23 16:54 ` Jan Kara
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 16:54 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:07, Amir Goldstein wrote:
> In vfs code, file_start_write() is usually called after the permission
> hook in rw_verify_area(). btrfs_ioctl_encoded_write() in an exception
> to this rule.
>
> Move file_start_write() to after the rw_verify_area() check in encoded
> write to make the permission hook "start-write-safe".
>
> This is needed for fanotify "pre content" events.
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/btrfs/ioctl.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index dfe257e1845b..0a7850c4be67 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4523,29 +4523,29 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool
> if (ret < 0)
> goto out_acct;
>
> - file_start_write(file);
> -
> if (iov_iter_count(&iter) == 0) {
> ret = 0;
> - goto out_end_write;
> + goto out_iov;
> }
> pos = args.offset;
> ret = rw_verify_area(WRITE, file, &pos, args.len);
> if (ret < 0)
> - goto out_end_write;
> + goto out_iov;
>
> init_sync_kiocb(&kiocb, file);
> ret = kiocb_set_rw_flags(&kiocb, 0);
> if (ret)
> - goto out_end_write;
> + goto out_iov;
> kiocb.ki_pos = pos;
>
> + file_start_write(file);
> +
> ret = btrfs_do_write_iter(&kiocb, &iter, &args);
> if (ret > 0)
> fsnotify_modify(file);
>
> -out_end_write:
> file_end_write(file);
> +out_iov:
> kfree(iov);
> out_acct:
> if (ret > 0)
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 11/16] fs: move permission hook out of do_iter_write()
2023-11-22 12:27 ` [PATCH v2 11/16] fs: move permission hook out of do_iter_write() Amir Goldstein
2023-11-22 14:33 ` Christian Brauner
@ 2023-11-23 17:08 ` Jan Kara
2023-11-24 8:45 ` Amir Goldstein
1 sibling, 1 reply; 55+ messages in thread
From: Jan Kara @ 2023-11-23 17:08 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:10, Amir Goldstein wrote:
> In many of the vfs helpers, the rw_verity_area() checks are called before
^^ verify
> taking sb_start_write(), making them "start-write-safe".
> do_iter_write() is an exception to this rule.
>
> do_iter_write() has two callers - vfs_iter_write() and vfs_writev().
> Move rw_verify_area() and other checks from do_iter_write() out to
> its callers to make them "start-write-safe".
>
> Move also the fsnotify_modify() hook to align with similar pattern
> used in vfs_write() and other vfs helpers.
>
> This is needed for fanotify "pre content" events.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Just one more nit below. Otherwise feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 87ca50f16a23..6c40468efe19 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -852,28 +852,10 @@ EXPORT_SYMBOL(vfs_iter_read);
> static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
> loff_t *pos, rwf_t flags)
> {
> - size_t tot_len;
> - ssize_t ret = 0;
> -
> - if (!(file->f_mode & FMODE_WRITE))
> - return -EBADF;
> - if (!(file->f_mode & FMODE_CAN_WRITE))
> - return -EINVAL;
> -
> - tot_len = iov_iter_count(iter);
> - if (!tot_len)
> - return 0;
> - ret = rw_verify_area(WRITE, file, pos, tot_len);
> - if (ret < 0)
> - return ret;
> -
> if (file->f_op->write_iter)
> - ret = do_iter_readv_writev(file, iter, pos, WRITE, flags);
> - else
> - ret = do_loop_readv_writev(file, iter, pos, WRITE, flags);
> - if (ret > 0)
> - fsnotify_modify(file);
> - return ret;
> + return do_iter_readv_writev(file, iter, pos, WRITE, flags);
> +
> + return do_loop_readv_writev(file, iter, pos, WRITE, flags);
> }
Since do_iter_write() is now trivial and has only two callers, one of which
has made sure file->f_op->write_iter != NULL, can we perhaps just fold this
into the callers? One less wrapper in the maze...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 12/16] fs: move permission hook out of do_iter_read()
2023-11-22 12:27 ` [PATCH v2 12/16] fs: move permission hook out of do_iter_read() Amir Goldstein
@ 2023-11-23 17:13 ` Jan Kara
2023-11-24 8:48 ` Amir Goldstein
0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2023-11-23 17:13 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:11, Amir Goldstein wrote:
> We recently moved fsnotify hook, rw_verify_area() and other checks from
> do_iter_write() out to its two callers.
>
> for consistency, do the same thing for do_iter_read() - move the
^^^ For
> rw_verify_area() checks and fsnotify hook to the callers vfs_iter_read()
> and vfs_readv().
>
> This aligns those vfs helpers with the pattern used in vfs_read() and
> vfs_iocb_iter_read() and the vfs write helpers, where all the checks are
> in the vfs helpers and the do_* or call_* helpers do the work.
>
> This is needed for fanotify "pre content" events.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Also one nit below. Otherwise feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
> +/*
> + * Low-level helpers don't perform rw sanity checks.
> + * The caller is responsible for that.
> + */
> static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
> - loff_t *pos, rwf_t flags)
> + loff_t *pos, rwf_t flags)
> +{
> + if (file->f_op->read_iter)
> + return do_iter_readv_writev(file, iter, pos, READ, flags);
> +
> + return do_loop_readv_writev(file, iter, pos, READ, flags);
> +}
Similarly as with do_iter_write() I don't think there's much point in this
helper when there's actually only one real user, the other one can just
call do_iter_readv_writev().
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 13/16] fs: move kiocb_start_write() into vfs_iocb_iter_write()
2023-11-22 12:27 ` [PATCH v2 13/16] fs: move kiocb_start_write() into vfs_iocb_iter_write() Amir Goldstein
@ 2023-11-23 17:30 ` Jan Kara
0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 17:30 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:12, Amir Goldstein wrote:
> In vfs code, sb_start_write() is usually called after the permission hook
> in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule,
> where kiocb_start_write() is called by its callers.
>
> Move kiocb_start_write() from the callers into vfs_iocb_iter_write()
> after the rw_verify_area() checks, to make them "start-write-safe".
>
> The semantics of vfs_iocb_iter_write() is changed, so that the caller is
> responsible for calling kiocb_end_write() on completion only if async
> iocb was queued. The completion handlers of both callers were adapted
> to this semantic change.
>
> This is needed for fanotify "pre content" events.
>
> Suggested-by: Jan Kara <jack@suse.cz>
> Suggested-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 14/16] fs: create __sb_write_started() helper
2023-11-22 12:27 ` [PATCH v2 14/16] fs: create __sb_write_started() helper Amir Goldstein
@ 2023-11-23 17:31 ` Jan Kara
2023-11-24 9:14 ` Amir Goldstein
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-11-23 17:31 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:13, Amir Goldstein wrote:
> Similar to sb_write_started() for use by other sb freeze levels.
>
> Unlike the boolean sb_write_started(), this helper returns a tristate
> to distiguish the cases of lockdep disabled or unknown lock state.
>
> This is needed for fanotify "pre content" events.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> include/linux/fs.h | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 98b7a7a8c42e..e8aa48797bf4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1645,9 +1645,22 @@ static inline bool __sb_start_write_trylock(struct super_block *sb, int level)
> #define __sb_writers_release(sb, lev) \
> percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
>
> +/**
> + * __sb_write_started - check if sb freeze level is held
> + * @sb: the super we write to
> + *
> + * > 0 sb freeze level is held
> + * 0 sb freeze level is not held
> + * < 0 !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN
> + */
> +static inline int __sb_write_started(const struct super_block *sb, int level)
> +{
> + return lockdep_is_held_type(sb->s_writers.rw_sem + level - 1, 1);
> +}
> +
> static inline bool sb_write_started(const struct super_block *sb)
> {
> - return lockdep_is_held_type(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1, 1);
> + return __sb_write_started(sb, SB_FREEZE_WRITE);
> }
>
> /**
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers
2023-11-22 12:27 ` [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers Amir Goldstein
@ 2023-11-23 17:35 ` Jan Kara
2023-11-24 8:20 ` Amir Goldstein
0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2023-11-23 17:35 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christian Brauner, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Wed 22-11-23 14:27:15, Amir Goldstein wrote:
> Create new helpers {sb,file}_write_not_started() that can be used
> to assert that sb_start_write() is not held.
>
> This is needed for fanotify "pre content" events.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
I'm not against this but I'm somewhat wondering, where exactly do you plan
to use this :) (does not seem to be in this patch set). Because one easily
forgets about the subtle implementation details and uses
!sb_write_started() instead of sb_write_not_started()...
Honza
> ---
> include/linux/fs.h | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 05780d993c7d..c085172f832f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1669,6 +1669,17 @@ static inline bool sb_write_started(const struct super_block *sb)
> return __sb_write_started(sb, SB_FREEZE_WRITE);
> }
>
> +/**
> + * sb_write_not_started - check if SB_FREEZE_WRITE is not held
> + * @sb: the super we write to
> + *
> + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
> + */
> +static inline bool sb_write_not_started(const struct super_block *sb)
> +{
> + return __sb_write_started(sb, SB_FREEZE_WRITE) <= 0;
> +}
> +
> /**
> * file_write_started - check if SB_FREEZE_WRITE is held
> * @file: the file we write to
> @@ -1684,6 +1695,21 @@ static inline bool file_write_started(const struct file *file)
> return sb_write_started(file_inode(file)->i_sb);
> }
>
> +/**
> + * file_write_not_started - check if SB_FREEZE_WRITE is not held
> + * @file: the file we write to
> + *
> + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
> + * May be false positive with !S_ISREG, because file_start_write() has
> + * no effect on !S_ISREG.
> + */
> +static inline bool file_write_not_started(const struct file *file)
> +{
> + if (!S_ISREG(file_inode(file)->i_mode))
> + return true;
> + return sb_write_not_started(file_inode(file)->i_sb);
> +}
> +
> /**
> * sb_end_write - drop write access to a superblock
> * @sb: the super we wrote to
> --
> 2.34.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write()
2023-11-23 16:53 ` Amir Goldstein
@ 2023-11-23 17:56 ` Christian Brauner
0 siblings, 0 replies; 55+ messages in thread
From: Christian Brauner @ 2023-11-23 17:56 UTC (permalink / raw)
To: Amir Goldstein
Cc: Christoph Hellwig, Jan Kara, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
> Stating the obvious - please don't forget to edit the commit message
> removing mention of the helper.
Done. See vfs.rw.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers
2023-11-23 17:35 ` Jan Kara
@ 2023-11-24 8:20 ` Amir Goldstein
2023-12-01 10:11 ` Jan Kara
0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-11-24 8:20 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, Josef Bacik, David Howells, Jens Axboe,
Miklos Szeredi, Al Viro, linux-fsdevel
On Fri, Nov 24, 2023 at 6:17 AM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 22-11-23 14:27:15, Amir Goldstein wrote:
> > Create new helpers {sb,file}_write_not_started() that can be used
> > to assert that sb_start_write() is not held.
> >
> > This is needed for fanotify "pre content" events.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> I'm not against this but I'm somewhat wondering, where exactly do you plan
> to use this :) (does not seem to be in this patch set).
As I wrote in the cover letter:
"The last 3 patches are helpers that I used in fanotify patches to
assert that permission hooks are called with expected locking scope."
But this is just half of the story.
The full story is that I added it in fsnotify_file_perm() hook to check
that we caught all the places that permission hook was called with
sb_writers held:
static inline int fsnotify_file_perm(struct file *file, int mask)
{
struct inode *inode = file_inode(file);
__u32 fsnotify_mask;
/*
* Content of file may be written on pre-content events, so sb freeze
* protection must not be held.
*/
lockdep_assert_once(file_write_not_started(file));
/*
* Pre-content events are only reported for regular files and dirs.
*/
if (mask & MAY_READ) {
And the assert triggered in a nested overlay case (overlay over overlay).
So I cannot keep the assert in the final patch as is.
I can probably move it into (mask & MAY_WRITE) case, because
I don't know of any existing write permission hook that is called with
sb_wrtiers held.
I also plan to use sb_write_not_started() in fsnotify_lookup_perm().
I think that:
"This is needed for fanotify "pre content" events."
sums this up nicely without getting into gory details ;)
> Because one easily
> forgets about the subtle implementation details and uses
> !sb_write_started() instead of sb_write_not_started()...
>
I think I had a comment in one version that said:
"This is NOT the same as !sb_write_started()"
We can add it back if you think it is useful, but FWIW, anyone
can use !sb_write_started() wrongly today whether we add
sb_write_not_started() or not.
But this would be pretty easy to detect - running a build without
CONFIG_LOCKDEP will catch this misuse pretty quickly.
Thanks,
Amir.
>
> > ---
> > include/linux/fs.h | 26 ++++++++++++++++++++++++++
> > 1 file changed, 26 insertions(+)
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 05780d993c7d..c085172f832f 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1669,6 +1669,17 @@ static inline bool sb_write_started(const struct super_block *sb)
> > return __sb_write_started(sb, SB_FREEZE_WRITE);
> > }
> >
> > +/**
> > + * sb_write_not_started - check if SB_FREEZE_WRITE is not held
> > + * @sb: the super we write to
> > + *
> > + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
> > + */
> > +static inline bool sb_write_not_started(const struct super_block *sb)
> > +{
> > + return __sb_write_started(sb, SB_FREEZE_WRITE) <= 0;
> > +}
> > +
> > /**
> > * file_write_started - check if SB_FREEZE_WRITE is held
> > * @file: the file we write to
> > @@ -1684,6 +1695,21 @@ static inline bool file_write_started(const struct file *file)
> > return sb_write_started(file_inode(file)->i_sb);
> > }
> >
> > +/**
> > + * file_write_not_started - check if SB_FREEZE_WRITE is not held
> > + * @file: the file we write to
> > + *
> > + * May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.
> > + * May be false positive with !S_ISREG, because file_start_write() has
> > + * no effect on !S_ISREG.
> > + */
> > +static inline bool file_write_not_started(const struct file *file)
> > +{
> > + if (!S_ISREG(file_inode(file)->i_mode))
> > + return true;
> > + return sb_write_not_started(file_inode(file)->i_sb);
> > +}
> > +
> > /**
> > * sb_end_write - drop write access to a superblock
> > * @sb: the super we wrote to
> > --
> > 2.34.1
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 11/16] fs: move permission hook out of do_iter_write()
2023-11-23 17:08 ` Jan Kara
@ 2023-11-24 8:45 ` Amir Goldstein
0 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-24 8:45 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, Josef Bacik, David Howells, Jens Axboe,
Miklos Szeredi, Al Viro, linux-fsdevel
On Fri, Nov 24, 2023 at 6:18 AM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 22-11-23 14:27:10, Amir Goldstein wrote:
> > In many of the vfs helpers, the rw_verity_area() checks are called before
> ^^ verify
>
> > taking sb_start_write(), making them "start-write-safe".
> > do_iter_write() is an exception to this rule.
> >
> > do_iter_write() has two callers - vfs_iter_write() and vfs_writev().
> > Move rw_verify_area() and other checks from do_iter_write() out to
> > its callers to make them "start-write-safe".
> >
> > Move also the fsnotify_modify() hook to align with similar pattern
> > used in vfs_write() and other vfs helpers.
> >
> > This is needed for fanotify "pre content" events.
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> Just one more nit below. Otherwise feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
>
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 87ca50f16a23..6c40468efe19 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -852,28 +852,10 @@ EXPORT_SYMBOL(vfs_iter_read);
> > static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
> > loff_t *pos, rwf_t flags)
> > {
> > - size_t tot_len;
> > - ssize_t ret = 0;
> > -
> > - if (!(file->f_mode & FMODE_WRITE))
> > - return -EBADF;
> > - if (!(file->f_mode & FMODE_CAN_WRITE))
> > - return -EINVAL;
> > -
> > - tot_len = iov_iter_count(iter);
> > - if (!tot_len)
> > - return 0;
> > - ret = rw_verify_area(WRITE, file, pos, tot_len);
> > - if (ret < 0)
> > - return ret;
> > -
> > if (file->f_op->write_iter)
> > - ret = do_iter_readv_writev(file, iter, pos, WRITE, flags);
> > - else
> > - ret = do_loop_readv_writev(file, iter, pos, WRITE, flags);
> > - if (ret > 0)
> > - fsnotify_modify(file);
> > - return ret;
> > + return do_iter_readv_writev(file, iter, pos, WRITE, flags);
> > +
> > + return do_loop_readv_writev(file, iter, pos, WRITE, flags);
> > }
>
> Since do_iter_write() is now trivial and has only two callers, one of which
> has made sure file->f_op->write_iter != NULL, can we perhaps just fold this
> into the callers? One less wrapper in the maze...
Yeh, nice cleanup.
Cristian,
Can you please fold this patch:
diff --git a/fs/read_write.c b/fs/read_write.c
index 6c40468efe19..46a90aa0ad56 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -849,15 +849,6 @@ ssize_t vfs_iter_read(struct file *file, struct
iov_iter *iter, loff_t *ppos,
}
EXPORT_SYMBOL(vfs_iter_read);
-static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
- loff_t *pos, rwf_t flags)
-{
- if (file->f_op->write_iter)
- return do_iter_readv_writev(file, iter, pos, WRITE, flags);
-
- return do_loop_readv_writev(file, iter, pos, WRITE, flags);
-}
-
ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb,
struct iov_iter *iter)
{
@@ -908,7 +899,7 @@ ssize_t vfs_iter_write(struct file *file, struct
iov_iter *iter, loff_t *ppos,
return ret;
file_start_write(file);
- ret = do_iter_write(file, iter, ppos, flags);
+ ret = do_iter_readv_writev(file, iter, ppos, WRITE, flags);
if (ret > 0)
fsnotify_modify(file);
file_end_write(file);
@@ -962,7 +953,10 @@ static ssize_t vfs_writev(struct file *file,
const struct iovec __user *vec,
goto out;
file_start_write(file);
- ret = do_iter_write(file, &iter, pos, flags);
+ if (file->f_op->write_iter)
+ ret = do_iter_readv_writev(file, &iter, pos, WRITE, flags);
+ else
+ ret = do_loop_readv_writev(file, &iter, pos, WRITE, flags);
if (ret > 0)
fsnotify_modify(file);
file_end_write(file);
--
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v2 12/16] fs: move permission hook out of do_iter_read()
2023-11-23 17:13 ` Jan Kara
@ 2023-11-24 8:48 ` Amir Goldstein
0 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-24 8:48 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, Josef Bacik, David Howells, Jens Axboe,
Miklos Szeredi, Al Viro, linux-fsdevel
On Fri, Nov 24, 2023 at 6:21 AM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 22-11-23 14:27:11, Amir Goldstein wrote:
> > We recently moved fsnotify hook, rw_verify_area() and other checks from
> > do_iter_write() out to its two callers.
> >
> > for consistency, do the same thing for do_iter_read() - move the
> ^^^ For
>
> > rw_verify_area() checks and fsnotify hook to the callers vfs_iter_read()
> > and vfs_readv().
> >
> > This aligns those vfs helpers with the pattern used in vfs_read() and
> > vfs_iocb_iter_read() and the vfs write helpers, where all the checks are
> > in the vfs helpers and the do_* or call_* helpers do the work.
> >
> > This is needed for fanotify "pre content" events.
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> Also one nit below. Otherwise feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> > +/*
> > + * Low-level helpers don't perform rw sanity checks.
> > + * The caller is responsible for that.
> > + */
> > static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
> > - loff_t *pos, rwf_t flags)
> > + loff_t *pos, rwf_t flags)
> > +{
> > + if (file->f_op->read_iter)
> > + return do_iter_readv_writev(file, iter, pos, READ, flags);
> > +
> > + return do_loop_readv_writev(file, iter, pos, READ, flags);
> > +}
>
> Similarly as with do_iter_write() I don't think there's much point in this
> helper when there's actually only one real user, the other one can just
> call do_iter_readv_writev().
Yeh, nice.
Christian,
Can you please fold this patch:
BTW, both patches need a very mild edit of commit message
to mention removal of do_iter_{read/write}() and the minor
spelling mistake fixes pointed out by Jan.
Thanks,
Amir.
diff --git a/fs/read_write.c b/fs/read_write.c
index 8358ace9282e..2953fea9b65b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -784,19 +784,6 @@ static ssize_t do_loop_readv_writev(struct file
*filp, struct iov_iter *iter,
return ret;
}
-/*
- * Low-level helpers don't perform rw sanity checks.
- * The caller is responsible for that.
- */
-static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
- loff_t *pos, rwf_t flags)
-{
- if (file->f_op->read_iter)
- return do_iter_readv_writev(file, iter, pos, READ, flags);
-
- return do_loop_readv_writev(file, iter, pos, READ, flags);
-}
-
ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb,
struct iov_iter *iter)
{
@@ -845,7 +832,7 @@ ssize_t vfs_iter_read(struct file *file, struct
iov_iter *iter, loff_t *ppos,
if (ret < 0)
return ret;
- ret = do_iter_read(file, iter, ppos, flags);
+ ret = do_iter_readv_writev(file, iter, ppos, READ, flags);
out:
if (ret >= 0)
fsnotify_access(file);
@@ -939,7 +926,10 @@ static ssize_t vfs_readv(struct file *file, const
struct iovec __user *vec,
if (ret < 0)
goto out;
- ret = do_iter_read(file, &iter, pos, flags);
+ if (file->f_op->read_iter)
+ ret = do_iter_readv_writev(file, &iter, pos, READ, flags);
+ else
+ ret = do_loop_readv_writev(file, &iter, pos, READ, flags);
out:
if (ret >= 0)
fsnotify_access(file);
--
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH v2 14/16] fs: create __sb_write_started() helper
2023-11-22 12:27 ` [PATCH v2 14/16] fs: create __sb_write_started() helper Amir Goldstein
2023-11-23 17:31 ` Jan Kara
@ 2023-11-24 9:14 ` Amir Goldstein
1 sibling, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-11-24 9:14 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Josef Bacik, David Howells, Jens Axboe, Miklos Szeredi,
Al Viro, linux-fsdevel
On Wed, Nov 22, 2023 at 2:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Similar to sb_write_started() for use by other sb freeze levels.
>
> Unlike the boolean sb_write_started(), this helper returns a tristate
> to distiguish the cases of lockdep disabled or unknown lock state.
>
> This is needed for fanotify "pre content" events.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> include/linux/fs.h | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 98b7a7a8c42e..e8aa48797bf4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1645,9 +1645,22 @@ static inline bool __sb_start_write_trylock(struct super_block *sb, int level)
> #define __sb_writers_release(sb, lev) \
> percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
>
> +/**
> + * __sb_write_started - check if sb freeze level is held
> + * @sb: the super we write to
Missing Kerneldoc of arg:
+ * @level: the freeze level
> + *
> + * > 0 sb freeze level is held
> + * 0 sb freeze level is not held
> + * < 0 !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN
> + */
> +static inline int __sb_write_started(const struct super_block *sb, int level)
> +{
> + return lockdep_is_held_type(sb->s_writers.rw_sem + level - 1, 1);
> +}
> +
> static inline bool sb_write_started(const struct super_block *sb)
> {
> - return lockdep_is_held_type(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1, 1);
> + return __sb_write_started(sb, SB_FREEZE_WRITE);
> }
>
> /**
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers
2023-11-24 8:20 ` Amir Goldstein
@ 2023-12-01 10:11 ` Jan Kara
0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2023-12-01 10:11 UTC (permalink / raw)
To: Amir Goldstein
Cc: Jan Kara, Christian Brauner, Josef Bacik, David Howells,
Jens Axboe, Miklos Szeredi, Al Viro, linux-fsdevel
On Fri 24-11-23 10:20:25, Amir Goldstein wrote:
> On Fri, Nov 24, 2023 at 6:17 AM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 22-11-23 14:27:15, Amir Goldstein wrote:
> > > Create new helpers {sb,file}_write_not_started() that can be used
> > > to assert that sb_start_write() is not held.
> > >
> > > This is needed for fanotify "pre content" events.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > I'm not against this but I'm somewhat wondering, where exactly do you plan
> > to use this :) (does not seem to be in this patch set).
>
> As I wrote in the cover letter:
> "The last 3 patches are helpers that I used in fanotify patches to
> assert that permission hooks are called with expected locking scope."
>
> But this is just half of the story.
>
> The full story is that I added it in fsnotify_file_perm() hook to check
> that we caught all the places that permission hook was called with
> sb_writers held:
>
> static inline int fsnotify_file_perm(struct file *file, int mask)
> {
> struct inode *inode = file_inode(file);
> __u32 fsnotify_mask;
>
> /*
> * Content of file may be written on pre-content events, so sb freeze
> * protection must not be held.
> */
> lockdep_assert_once(file_write_not_started(file));
>
> /*
> * Pre-content events are only reported for regular files and dirs.
> */
> if (mask & MAY_READ) {
>
>
> And the assert triggered in a nested overlay case (overlay over overlay).
> So I cannot keep the assert in the final patch as is.
> I can probably move it into (mask & MAY_WRITE) case, because
> I don't know of any existing write permission hook that is called with
> sb_wrtiers held.
>
> I also plan to use sb_write_not_started() in fsnotify_lookup_perm().
>
> I think that:
> "This is needed for fanotify "pre content" events."
> sums this up nicely without getting into gory details ;)
>
> > Because one easily
> > forgets about the subtle implementation details and uses
> > !sb_write_started() instead of sb_write_not_started()...
> >
>
> I think I had a comment in one version that said:
> "This is NOT the same as !sb_write_started()"
>
> We can add it back if you think it is useful, but FWIW, anyone
> can use !sb_write_started() wrongly today whether we add
> sb_write_not_started() or not.
>
> But this would be pretty easy to detect - running a build without
> CONFIG_LOCKDEP will catch this misuse pretty quickly.
Yeah, fair enough. Thanks for explanation!
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2023-12-01 10:12 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-22 12:26 [PATCH v2 00/16] Tidy up file permission hooks Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 01/16] ovl: add permission hooks outside of do_splice_direct() Amir Goldstein
2023-11-23 7:35 ` Christoph Hellwig
2023-11-23 16:28 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 02/16] splice: remove permission hook from do_splice_direct() Amir Goldstein
2023-11-23 7:36 ` Christoph Hellwig
2023-11-23 16:28 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 03/16] splice: move permission hook out of splice_direct_to_actor() Amir Goldstein
2023-11-23 7:36 ` Christoph Hellwig
2023-11-23 16:35 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 04/16] splice: move permission hook out of splice_file_to_pipe() Amir Goldstein
2023-11-23 7:38 ` Christoph Hellwig
2023-11-23 16:37 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 05/16] splice: remove permission hook from iter_file_splice_write() Amir Goldstein
2023-11-23 7:47 ` Christoph Hellwig
2023-11-23 11:20 ` Amir Goldstein
2023-11-23 15:14 ` Christoph Hellwig
2023-11-23 15:41 ` Amir Goldstein
2023-11-23 15:45 ` Christoph Hellwig
2023-11-23 16:22 ` Christian Brauner
2023-11-23 16:53 ` Amir Goldstein
2023-11-23 17:56 ` Christian Brauner
2023-11-22 12:27 ` [PATCH v2 06/16] remap_range: move permission hooks out of do_clone_file_range() Amir Goldstein
2023-11-23 7:47 ` Christoph Hellwig
2023-11-23 16:52 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 07/16] remap_range: move file_start_write() to after permission hook Amir Goldstein
2023-11-23 7:48 ` Christoph Hellwig
2023-11-23 16:53 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 08/16] btrfs: " Amir Goldstein
2023-11-23 7:48 ` Christoph Hellwig
2023-11-23 16:54 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 09/16] coda: change locking order in coda_file_write_iter() Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 10/16] fs: move file_start_write() into vfs_iter_write() Amir Goldstein
2023-11-23 7:50 ` Christoph Hellwig
2023-11-23 8:04 ` Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 11/16] fs: move permission hook out of do_iter_write() Amir Goldstein
2023-11-22 14:33 ` Christian Brauner
2023-11-22 15:25 ` Amir Goldstein
2023-11-23 17:08 ` Jan Kara
2023-11-24 8:45 ` Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 12/16] fs: move permission hook out of do_iter_read() Amir Goldstein
2023-11-23 17:13 ` Jan Kara
2023-11-24 8:48 ` Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 13/16] fs: move kiocb_start_write() into vfs_iocb_iter_write() Amir Goldstein
2023-11-23 17:30 ` Jan Kara
2023-11-22 12:27 ` [PATCH v2 14/16] fs: create __sb_write_started() helper Amir Goldstein
2023-11-23 17:31 ` Jan Kara
2023-11-24 9:14 ` Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 15/16] fs: create file_write_started() helper Amir Goldstein
2023-11-22 12:27 ` [PATCH v2 16/16] fs: create {sb,file}_write_not_started() helpers Amir Goldstein
2023-11-23 17:35 ` Jan Kara
2023-11-24 8:20 ` Amir Goldstein
2023-12-01 10:11 ` Jan Kara
2023-11-22 14:06 ` [PATCH v2 00/16] Tidy up file permission hooks Josef Bacik
2023-11-22 15:04 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).