linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Stash overlay real upper file in backing_file
@ 2024-10-04 10:23 Amir Goldstein
  2024-10-04 10:23 ` [PATCH 1/4] ovl: do not open non-data lower file for fsync Amir Goldstein
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-10-04 10:23 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

Hi all,

Al Viro posted a proposal to cleanup overlayfs handling of temporary
cloned real file references [1].

This is a proposal in the opposite direction to get rid of the temporary
cloned file references, because they are inefficient and cause for ugly
subtle code.

Al, I think that with these changes, overlayfs no longer has value in
using the proposed fderr struct?

FWIW, struct backing_file has no dedicated memcache pool - before
Christian's diet to struct file, struct backing_file was 248 bytes on x86
and now it is 200 bytes, so the addition of 8 more bytes to strucy
backing_file changes nothing wrt memory usage.

Miklos,

The implementation of ovl_real_file() helper is roughly based on
ovl_real_dir_file().

do you see any problems with this approach or any races not handled?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20241003234534.GM4017910@ZenIV/

Amir Goldstein (4):
  ovl: do not open non-data lower file for fsync
  ovl: stash upper real file in backing_file struct
  ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta()
  ovl: convert ovl_real_fdget() callers to ovl_real_file()

 fs/file_table.c     |   7 ++
 fs/internal.h       |   6 ++
 fs/overlayfs/file.c | 230 +++++++++++++++++++++++---------------------
 3 files changed, 133 insertions(+), 110 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] ovl: do not open non-data lower file for fsync
  2024-10-04 10:23 [PATCH 0/4] Stash overlay real upper file in backing_file Amir Goldstein
@ 2024-10-04 10:23 ` Amir Goldstein
  2024-10-04 22:16   ` Al Viro
  2024-10-04 10:23 ` [PATCH 2/4] ovl: stash upper real file in backing_file struct Amir Goldstein
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-10-04 10:23 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

ovl_fsync() with !datasync opens a backing file from the top most dentry
in the stack, checks if this dentry is non-upper and skips the fsync.

In case of an overlay dentry stack with lower data and lower metadata
above it, but without an upper metadata above it, the backing file is
opened from the top most lower metadata dentry and never used.

Fix the helper ovl_real_fdget_meta() to return an empty struct fd in
that case to avoid the unneeded backing file open.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4504493b20be..3d64d00ef981 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -90,17 +90,19 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 }
 
 static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
-			       bool allow_meta)
+			       bool upper_meta)
 {
 	struct dentry *dentry = file_dentry(file);
 	struct file *realfile = file->private_data;
 	struct path realpath;
 	int err;
 
-	real->word = (unsigned long)realfile;
+	real->word = 0;
 
-	if (allow_meta) {
-		ovl_path_real(dentry, &realpath);
+	if (upper_meta) {
+		ovl_path_upper(dentry, &realpath);
+		if (!realpath.dentry)
+			return 0;
 	} else {
 		/* lazy lookup and verify of lowerdata */
 		err = ovl_verify_lowerdata(dentry);
@@ -395,7 +397,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 		return ret;
 
 	ret = ovl_real_fdget_meta(file, &real, !datasync);
-	if (ret)
+	if (ret || fd_empty(real))
 		return ret;
 
 	/* Don't sync lower file for fear of receiving EROFS error */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] ovl: stash upper real file in backing_file struct
  2024-10-04 10:23 [PATCH 0/4] Stash overlay real upper file in backing_file Amir Goldstein
  2024-10-04 10:23 ` [PATCH 1/4] ovl: do not open non-data lower file for fsync Amir Goldstein
@ 2024-10-04 10:23 ` Amir Goldstein
  2024-10-04 10:23 ` [PATCH 3/4] ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta() Amir Goldstein
  2024-10-04 10:23 ` [PATCH 4/4] ovl: convert ovl_real_fdget() callers to ovl_real_file() Amir Goldstein
  3 siblings, 0 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-10-04 10:23 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

When an overlayfs file is opened as lower and then the file is copied up,
every operation on the overlayfs open file will open a temporary backing
file to the upper dentry and close it at the end of the operation.

The original (lower) real file is stored in file->private_data pointer.
We could have allocated a struct ovl_real_file to potentially store two
backing files, the lower and the upper, but the original backing file
struct is not very space optimized (it has no memcache pool), so add a
private_data pointer to the backing_file struct and store the optional
second backing upper file in there instead of opening a temporary upper
file on every operation.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/file_table.c     |  7 +++++++
 fs/internal.h       |  6 ++++++
 fs/overlayfs/file.c | 48 ++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index eed5ffad9997..1c2c08a5a66a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -47,6 +47,7 @@ static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 struct backing_file {
 	struct file file;
 	struct path user_path;
+	void *private_data;
 };
 
 static inline struct backing_file *backing_file(struct file *f)
@@ -60,6 +61,12 @@ struct path *backing_file_user_path(struct file *f)
 }
 EXPORT_SYMBOL_GPL(backing_file_user_path);
 
+void **backing_file_private_ptr(struct file *f)
+{
+	return &backing_file(f)->private_data;
+}
+EXPORT_SYMBOL_GPL(backing_file_private_ptr);
+
 static inline void file_free(struct file *f)
 {
 	security_file_free(f);
diff --git a/fs/internal.h b/fs/internal.h
index 8c1b7acbbe8f..b1152a3e8ba2 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -100,6 +100,12 @@ extern void chroot_fs_refs(const struct path *, const struct path *);
 struct file *alloc_empty_file(int flags, const struct cred *cred);
 struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
 struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
+void **backing_file_private_ptr(struct file *f);
+
+static inline void *backing_file_private(struct file *f)
+{
+	return READ_ONCE(*backing_file_private_ptr(f));
+}
 
 static inline void file_put_write_access(struct file *file)
 {
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 3d64d00ef981..60a543b9a834 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -14,6 +14,8 @@
 #include <linux/backing-file.h>
 #include "overlayfs.h"
 
+#include "../internal.h"	/* for backing_file_private{,_ptr}() */
+
 static char ovl_whatisit(struct inode *inode, struct inode *realinode)
 {
 	if (realinode != ovl_inode_upper(inode))
@@ -94,6 +96,7 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 {
 	struct dentry *dentry = file_dentry(file);
 	struct file *realfile = file->private_data;
+	struct file *upperfile = backing_file_private(realfile);
 	struct path realpath;
 	int err;
 
@@ -114,15 +117,37 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 	if (!realpath.dentry)
 		return -EIO;
 
-	/* Has it been copied up since we'd opened it? */
+stashed_upper:
+	if (upperfile && file_inode(upperfile) == d_inode(realpath.dentry))
+		realfile = upperfile;
+
+	/*
+	 * If realfile is lower and has been copied up since we'd opened it,
+	 * open the real upper file and stash it in backing_file_private().
+	 */
 	if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
-		struct file *f = ovl_open_realfile(file, &realpath);
-		if (IS_ERR(f))
-			return PTR_ERR(f);
-		real->word = (unsigned long)f | FDPUT_FPUT;
-		return 0;
+		struct file *old;
+
+		/* Stashed upperfile has a mismatched inode */
+		if (unlikely(upperfile))
+			return -EIO;
+
+		upperfile = ovl_open_realfile(file, &realpath);
+		if (IS_ERR(upperfile))
+			return PTR_ERR(upperfile);
+
+		old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
+				      upperfile);
+		if (old) {
+			fput(upperfile);
+			upperfile = old;
+		}
+
+		goto stashed_upper;
 	}
 
+	real->word = (unsigned long)realfile;
+
 	/* Did the flags change since open? */
 	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
 		return ovl_change_flags(realfile, file->f_flags);
@@ -177,7 +202,16 @@ static int ovl_open(struct inode *inode, struct file *file)
 
 static int ovl_release(struct inode *inode, struct file *file)
 {
-	fput(file->private_data);
+	struct file *realfile = file->private_data;
+	struct file *upperfile = backing_file_private(realfile);
+
+	fput(realfile);
+	/*
+	 * If realfile is lower and file was copied up and accessed, we need
+	 * to put reference also on the stashed real upperfile.
+	 */
+	if (upperfile)
+		fput(upperfile);
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta()
  2024-10-04 10:23 [PATCH 0/4] Stash overlay real upper file in backing_file Amir Goldstein
  2024-10-04 10:23 ` [PATCH 1/4] ovl: do not open non-data lower file for fsync Amir Goldstein
  2024-10-04 10:23 ` [PATCH 2/4] ovl: stash upper real file in backing_file struct Amir Goldstein
@ 2024-10-04 10:23 ` Amir Goldstein
  2024-10-04 22:23   ` Al Viro
  2024-10-04 10:23 ` [PATCH 4/4] ovl: convert ovl_real_fdget() callers to ovl_real_file() Amir Goldstein
  3 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-10-04 10:23 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

Stop using struct fd to return a real file from ovl_real_fdget_meta(),
because we no longer return a temporary file object and the callers
always get a borrowed file reference.

Rename the helper to ovl_real_file_meta(), return a borrowed reference
of the real file that is referenced from the overlayfs file or an error.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 62 ++++++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 60a543b9a834..d383ff22ccdb 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -91,8 +91,7 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 	return 0;
 }
 
-static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
-			       bool upper_meta)
+static struct file *ovl_real_file_meta(const struct file *file, bool upper_meta)
 {
 	struct dentry *dentry = file_dentry(file);
 	struct file *realfile = file->private_data;
@@ -100,22 +99,20 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 	struct path realpath;
 	int err;
 
-	real->word = 0;
-
 	if (upper_meta) {
 		ovl_path_upper(dentry, &realpath);
 		if (!realpath.dentry)
-			return 0;
+			return NULL;
 	} else {
 		/* lazy lookup and verify of lowerdata */
 		err = ovl_verify_lowerdata(dentry);
 		if (err)
-			return err;
+			return ERR_PTR(err);
 
 		ovl_path_realdata(dentry, &realpath);
 	}
 	if (!realpath.dentry)
-		return -EIO;
+		return ERR_PTR(-EIO);
 
 stashed_upper:
 	if (upperfile && file_inode(upperfile) == d_inode(realpath.dentry))
@@ -130,11 +127,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 
 		/* Stashed upperfile has a mismatched inode */
 		if (unlikely(upperfile))
-			return -EIO;
+			return ERR_PTR(-EIO);
 
 		upperfile = ovl_open_realfile(file, &realpath);
 		if (IS_ERR(upperfile))
-			return PTR_ERR(upperfile);
+			return upperfile;
 
 		old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
 				      upperfile);
@@ -146,26 +143,31 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 		goto stashed_upper;
 	}
 
-	real->word = (unsigned long)realfile;
-
 	/* Did the flags change since open? */
-	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
-		return ovl_change_flags(realfile, file->f_flags);
+	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS)) {
+		err = ovl_change_flags(realfile, file->f_flags);
+		if (err)
+			return ERR_PTR(err);
+	}
 
-	return 0;
+	return realfile;
 }
 
-static int ovl_real_fdget(const struct file *file, struct fd *real)
+static struct file *ovl_real_file(const struct file *file)
 {
-	if (d_is_dir(file_dentry(file))) {
-		struct file *f = ovl_dir_real_file(file, false);
-		if (IS_ERR(f))
-			return PTR_ERR(f);
-		real->word = (unsigned long)f;
-		return 0;
-	}
+	if (d_is_dir(file_dentry(file)))
+		return ovl_dir_real_file(file, false);
 
-	return ovl_real_fdget_meta(file, real, false);
+	return ovl_real_file_meta(file, false);
+}
+
+static int ovl_real_fdget(const struct file *file, struct fd *real)
+{
+	struct file *f = ovl_real_file(file, false);
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+	real->word = (unsigned long)f;
+	return 0;
 }
 
 static int ovl_open(struct inode *inode, struct file *file)
@@ -422,7 +424,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 {
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	int ret;
 
@@ -430,19 +432,17 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	if (ret <= 0)
 		return ret;
 
-	ret = ovl_real_fdget_meta(file, &real, !datasync);
-	if (ret || fd_empty(real))
-		return ret;
+	realfile = ovl_real_file_meta(file, !datasync);
+	if (IS_ERR_OR_NULL(realfile))
+		return PTR_ERR(realfile);
 
 	/* Don't sync lower file for fear of receiving EROFS error */
-	if (file_inode(fd_file(real)) == ovl_inode_upper(file_inode(file))) {
+	if (file_inode(realfile) == ovl_inode_upper(file_inode(file))) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
-		ret = vfs_fsync_range(fd_file(real), start, end, datasync);
+		ret = vfs_fsync_range(realfile, start, end, datasync);
 		revert_creds(old_cred);
 	}
 
-	fdput(real);
-
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] ovl: convert ovl_real_fdget() callers to ovl_real_file()
  2024-10-04 10:23 [PATCH 0/4] Stash overlay real upper file in backing_file Amir Goldstein
                   ` (2 preceding siblings ...)
  2024-10-04 10:23 ` [PATCH 3/4] ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta() Amir Goldstein
@ 2024-10-04 10:23 ` Amir Goldstein
  2024-10-04 22:25   ` Al Viro
  3 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-10-04 10:23 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

Stop using struct fd to return a real file from ovl_real_fdget(),
because we no longer return a temporary file object and the callers
always get a borrowed file reference.

Rename the helper to ovl_real_file(), return a borrowed reference of
the real file that is referenced from the overlayfs file or an error.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 142 ++++++++++++++++++--------------------------
 1 file changed, 58 insertions(+), 84 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index d383ff22ccdb..b8581c253d45 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -161,15 +161,6 @@ static struct file *ovl_real_file(const struct file *file)
 	return ovl_real_file_meta(file, false);
 }
 
-static int ovl_real_fdget(const struct file *file, struct fd *real)
-{
-	struct file *f = ovl_real_file(file, false);
-	if (IS_ERR(f))
-		return PTR_ERR(f);
-	real->word = (unsigned long)f;
-	return 0;
-}
-
 static int ovl_open(struct inode *inode, struct file *file)
 {
 	struct dentry *dentry = file_dentry(file);
@@ -221,7 +212,7 @@ static int ovl_release(struct inode *inode, struct file *file)
 static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 {
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -237,9 +228,9 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 			return vfs_setpos(file, 0, 0);
 	}
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
 	/*
 	 * Overlay file f_pos is the master copy that is preserved
@@ -249,17 +240,15 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 	 * files, so we use the real file to perform seeks.
 	 */
 	ovl_inode_lock(inode);
-	fd_file(real)->f_pos = file->f_pos;
+	realfile->f_pos = file->f_pos;
 
 	old_cred = ovl_override_creds(inode->i_sb);
-	ret = vfs_llseek(fd_file(real), offset, whence);
+	ret = vfs_llseek(realfile, offset, whence);
 	revert_creds(old_cred);
 
-	file->f_pos = fd_file(real)->f_pos;
+	file->f_pos = realfile->f_pos;
 	ovl_inode_unlock(inode);
 
-	fdput(real);
-
 	return ret;
 }
 
@@ -300,8 +289,7 @@ static void ovl_file_accessed(struct file *file)
 static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
-	struct fd real;
-	ssize_t ret;
+	struct file *realfile;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(file)->i_sb),
 		.user_file = file,
@@ -311,22 +299,19 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (!iov_iter_count(iter))
 		return 0;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
-
-	ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
-				     &ctx);
-	fdput(real);
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
-	return ret;
+	return backing_file_read_iter(realfile, iter, iocb, iocb->ki_flags,
+				      &ctx);
 }
 
 static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct file *realfile;
 	ssize_t ret;
 	int ifl = iocb->ki_flags;
 	struct backing_file_ctx ctx = {
@@ -342,8 +327,9 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
+	realfile = ovl_real_file(file);
+	ret = PTR_ERR(realfile);
+	if (IS_ERR(realfile))
 		goto out_unlock;
 
 	if (!ovl_should_sync(OVL_FS(inode->i_sb)))
@@ -354,8 +340,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	 * this property in case it is set by the issuer.
 	 */
 	ifl &= ~IOCB_DIO_CALLER_COMP;
-	ret = backing_file_write_iter(fd_file(real), iter, iocb, ifl, &ctx);
-	fdput(real);
+	ret = backing_file_write_iter(realfile, iter, iocb, ifl, &ctx);
 
 out_unlock:
 	inode_unlock(inode);
@@ -367,28 +352,24 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 			       struct pipe_inode_info *pipe, size_t len,
 			       unsigned int flags)
 {
-	struct fd real;
-	ssize_t ret;
+	struct file *realfile;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(in)->i_sb),
 		.user_file = in,
 		.accessed = ovl_file_accessed,
 	};
 
-	ret = ovl_real_fdget(in, &real);
-	if (ret)
-		return ret;
-
-	ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
-	fdput(real);
+	realfile = ovl_real_file(in);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
-	return ret;
+	return backing_file_splice_read(realfile, ppos, pipe, len, flags, &ctx);
 }
 
 /*
  * Calling iter_file_splice_write() directly from overlay's f_op may deadlock
  * due to lock order inversion between pipe->mutex in iter_file_splice_write()
- * and file_start_write(fd_file(real)) in ovl_write_iter().
+ * and file_start_write(realfile) in ovl_write_iter().
  *
  * So do everything ovl_write_iter() does and call iter_file_splice_write() on
  * the real file.
@@ -396,7 +377,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 				loff_t *ppos, size_t len, unsigned int flags)
 {
-	struct fd real;
+	struct file *realfile;
 	struct inode *inode = file_inode(out);
 	ssize_t ret;
 	struct backing_file_ctx ctx = {
@@ -409,12 +390,12 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	ret = ovl_real_fdget(out, &real);
-	if (ret)
+	realfile = ovl_real_file(out);
+	ret = PTR_ERR(realfile);
+	if (IS_ERR(realfile))
 		goto out_unlock;
 
-	ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
-	fdput(real);
+	ret = backing_file_splice_write(pipe, realfile, ppos, len, flags, &ctx);
 
 out_unlock:
 	inode_unlock(inode);
@@ -461,7 +442,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	int ret;
 
@@ -472,19 +453,18 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	if (ret)
 		goto out_unlock;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
+	realfile = ovl_real_file(file);
+	ret = PTR_ERR(realfile);
+	if (IS_ERR(realfile))
 		goto out_unlock;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fallocate(fd_file(real), mode, offset, len);
+	ret = vfs_fallocate(realfile, mode, offset, len);
 	revert_creds(old_cred);
 
 	/* Update size */
 	ovl_file_modified(file);
 
-	fdput(real);
-
 out_unlock:
 	inode_unlock(inode);
 
@@ -493,20 +473,18 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 {
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	int ret;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fadvise(fd_file(real), offset, len, advice);
+	ret = vfs_fadvise(realfile, offset, len, advice);
 	revert_creds(old_cred);
 
-	fdput(real);
-
 	return ret;
 }
 
@@ -521,7 +499,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			    loff_t len, unsigned int flags, enum ovl_copyop op)
 {
 	struct inode *inode_out = file_inode(file_out);
-	struct fd real_in, real_out;
+	struct file *realfile_in, *realfile_out;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -534,31 +512,31 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			goto out_unlock;
 	}
 
-	ret = ovl_real_fdget(file_out, &real_out);
-	if (ret)
+	realfile_out = ovl_real_file(file_out);
+	ret = PTR_ERR(realfile_out);
+	if (IS_ERR(realfile_out))
 		goto out_unlock;
 
-	ret = ovl_real_fdget(file_in, &real_in);
-	if (ret) {
-		fdput(real_out);
+	realfile_in = ovl_real_file(file_in);
+	ret = PTR_ERR(realfile_in);
+	if (IS_ERR(realfile_in))
 		goto out_unlock;
-	}
 
 	old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
 	switch (op) {
 	case OVL_COPY:
-		ret = vfs_copy_file_range(fd_file(real_in), pos_in,
-					  fd_file(real_out), pos_out, len, flags);
+		ret = vfs_copy_file_range(realfile_in, pos_in,
+					  realfile_out, pos_out, len, flags);
 		break;
 
 	case OVL_CLONE:
-		ret = vfs_clone_file_range(fd_file(real_in), pos_in,
-					   fd_file(real_out), pos_out, len, flags);
+		ret = vfs_clone_file_range(realfile_in, pos_in,
+					   realfile_out, pos_out, len, flags);
 		break;
 
 	case OVL_DEDUPE:
-		ret = vfs_dedupe_file_range_one(fd_file(real_in), pos_in,
-						fd_file(real_out), pos_out, len,
+		ret = vfs_dedupe_file_range_one(realfile_in, pos_in,
+						realfile_out, pos_out, len,
 						flags);
 		break;
 	}
@@ -567,9 +545,6 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	/* Update size */
 	ovl_file_modified(file_out);
 
-	fdput(real_in);
-	fdput(real_out);
-
 out_unlock:
 	inode_unlock(inode_out);
 
@@ -613,20 +588,19 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
 
 static int ovl_flush(struct file *file, fl_owner_t id)
 {
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	int err;
 
-	err = ovl_real_fdget(file, &real);
-	if (err)
-		return err;
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
-	if (fd_file(real)->f_op->flush) {
+	if (realfile->f_op->flush) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
-		err = fd_file(real)->f_op->flush(fd_file(real), id);
+		err = realfile->f_op->flush(realfile, id);
 		revert_creds(old_cred);
 	}
-	fdput(real);
 
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ovl: do not open non-data lower file for fsync
  2024-10-04 10:23 ` [PATCH 1/4] ovl: do not open non-data lower file for fsync Amir Goldstein
@ 2024-10-04 22:16   ` Al Viro
  2024-10-04 22:28     ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2024-10-04 22:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Fri, Oct 04, 2024 at 12:23:39PM +0200, Amir Goldstein wrote:
> ovl_fsync() with !datasync opens a backing file from the top most dentry
> in the stack, checks if this dentry is non-upper and skips the fsync.
> 
> In case of an overlay dentry stack with lower data and lower metadata
> above it, but without an upper metadata above it, the backing file is
> opened from the top most lower metadata dentry and never used.
> 
> Fix the helper ovl_real_fdget_meta() to return an empty struct fd in
> that case to avoid the unneeded backing file open.

Umm...  Won't that screw the callers of ovl_real_fd()?

I mean, here

> @@ -395,7 +397,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>  		return ret;
>  
>  	ret = ovl_real_fdget_meta(file, &real, !datasync);
> -	if (ret)
> +	if (ret || fd_empty(real))
>  		return ret;
>  

you are taking account of that, but what of e.g.
        ret = ovl_real_fdget(file, &real);
        if (ret)
                return ret;

        /*
         * Overlay file f_pos is the master copy that is preserved
         * through copy up and modified on read/write, but only real
         * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
         * limitations that are more strict than ->s_maxbytes for specific
         * files, so we use the real file to perform seeks.
         */
        ovl_inode_lock(inode);
        fd_file(real)->f_pos = file->f_pos;
in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
have it return 0 with NULL in fd_file(real), and you've got an oops right
there, don't you?

At the very least it's a bisect hazard...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta()
  2024-10-04 10:23 ` [PATCH 3/4] ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta() Amir Goldstein
@ 2024-10-04 22:23   ` Al Viro
  2024-10-05 12:37     ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2024-10-04 22:23 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Fri, Oct 04, 2024 at 12:23:41PM +0200, Amir Goldstein wrote:

>  	if (upper_meta) {
>  		ovl_path_upper(dentry, &realpath);
>  		if (!realpath.dentry)
> -			return 0;
> +			return NULL;
>  	} else {
>  		/* lazy lookup and verify of lowerdata */
>  		err = ovl_verify_lowerdata(dentry);
>  		if (err)
> -			return err;
> +			return ERR_PTR(err);

Ugh...  That kind of calling conventions is generally a bad idea.

> +	return realfile;

... especially since it's NULL/ERR_PTR()/pointer to object.


> +	realfile = ovl_real_file_meta(file, !datasync);
> +	if (IS_ERR_OR_NULL(realfile))
> +		return PTR_ERR(realfile);

Please, don't.  IS_ERR_OR_NULL is bogus 9 times out of 10 (at least).
And you still have breakage in llseek et.al.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] ovl: convert ovl_real_fdget() callers to ovl_real_file()
  2024-10-04 10:23 ` [PATCH 4/4] ovl: convert ovl_real_fdget() callers to ovl_real_file() Amir Goldstein
@ 2024-10-04 22:25   ` Al Viro
  0 siblings, 0 replies; 14+ messages in thread
From: Al Viro @ 2024-10-04 22:25 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Fri, Oct 04, 2024 at 12:23:42PM +0200, Amir Goldstein wrote:
> -	ret = ovl_real_fdget(file, &real);
> -	if (ret)
> -		return ret;
> +	realfile = ovl_real_file(file);
> +	if (IS_ERR(realfile))
> +		return PTR_ERR(realfile);

> -	fd_file(real)->f_pos = file->f_pos;
> +	realfile->f_pos = file->f_pos;

Still an oops, unless I'm misreading that.

> +	realfile = ovl_real_file(file);
> +	if (IS_ERR(realfile))
> +		return PTR_ERR(realfile);
>  
> -	return ret;

so's this

> +	return backing_file_read_iter(realfile, iter, iocb, iocb->ki_flags,
> +				      &ctx);
>  }

... and all the way through the rest of this commit.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ovl: do not open non-data lower file for fsync
  2024-10-04 22:16   ` Al Viro
@ 2024-10-04 22:28     ` Al Viro
  2024-10-05  1:35       ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2024-10-04 22:28 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Fri, Oct 04, 2024 at 11:16:25PM +0100, Al Viro wrote:
> On Fri, Oct 04, 2024 at 12:23:39PM +0200, Amir Goldstein wrote:
> > ovl_fsync() with !datasync opens a backing file from the top most dentry
> > in the stack, checks if this dentry is non-upper and skips the fsync.
> > 
> > In case of an overlay dentry stack with lower data and lower metadata
> > above it, but without an upper metadata above it, the backing file is
> > opened from the top most lower metadata dentry and never used.
> > 
> > Fix the helper ovl_real_fdget_meta() to return an empty struct fd in
> > that case to avoid the unneeded backing file open.
> 
> Umm...  Won't that screw the callers of ovl_real_fd()?
> 
> I mean, here
> 
> > @@ -395,7 +397,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> >  		return ret;
> >  
> >  	ret = ovl_real_fdget_meta(file, &real, !datasync);
> > -	if (ret)
> > +	if (ret || fd_empty(real))
> >  		return ret;
> >  
> 
> you are taking account of that, but what of e.g.
>         ret = ovl_real_fdget(file, &real);
>         if (ret)
>                 return ret;
> 
>         /*
>          * Overlay file f_pos is the master copy that is preserved
>          * through copy up and modified on read/write, but only real
>          * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
>          * limitations that are more strict than ->s_maxbytes for specific
>          * files, so we use the real file to perform seeks.
>          */
>         ovl_inode_lock(inode);
>         fd_file(real)->f_pos = file->f_pos;
> in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
> have it return 0 with NULL in fd_file(real), and you've got an oops right
> there, don't you?

I see... so you rely upon that thing never happening when the last argument of
ovl_real_fdget_meta() is false, including the call from ovl_real_fdget().

I still don't like the calling conventions, TBH.  Let me think a bit...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ovl: do not open non-data lower file for fsync
  2024-10-04 22:28     ` Al Viro
@ 2024-10-05  1:35       ` Al Viro
  2024-10-05  6:30         ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2024-10-05  1:35 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Fri, Oct 04, 2024 at 11:28:11PM +0100, Al Viro wrote:
> >         /*
> >          * Overlay file f_pos is the master copy that is preserved
> >          * through copy up and modified on read/write, but only real
> >          * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
> >          * limitations that are more strict than ->s_maxbytes for specific
> >          * files, so we use the real file to perform seeks.
> >          */
> >         ovl_inode_lock(inode);
> >         fd_file(real)->f_pos = file->f_pos;
> > in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
> > have it return 0 with NULL in fd_file(real), and you've got an oops right
> > there, don't you?
> 
> I see... so you rely upon that thing never happening when the last argument of
> ovl_real_fdget_meta() is false, including the call from ovl_real_fdget().
> 
> I still don't like the calling conventions, TBH.  Let me think a bit...

Sorry, I'm afraid I'll have to leave that until tomorrow - over 38C after the sodding
shingles shot really screws the ability to dig through the code ;-/  My apologies...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ovl: do not open non-data lower file for fsync
  2024-10-05  1:35       ` Al Viro
@ 2024-10-05  6:30         ` Amir Goldstein
  2024-10-05 19:49           ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-10-05  6:30 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Sat, Oct 5, 2024 at 3:35 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Oct 04, 2024 at 11:28:11PM +0100, Al Viro wrote:
> > >         /*
> > >          * Overlay file f_pos is the master copy that is preserved
> > >          * through copy up and modified on read/write, but only real
> > >          * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
> > >          * limitations that are more strict than ->s_maxbytes for specific
> > >          * files, so we use the real file to perform seeks.
> > >          */
> > >         ovl_inode_lock(inode);
> > >         fd_file(real)->f_pos = file->f_pos;
> > > in ovl_llseek()?  Get ovl_real_fdget_meta() called by ovl_real_fdget() and
> > > have it return 0 with NULL in fd_file(real), and you've got an oops right
> > > there, don't you?
> >
> > I see... so you rely upon that thing never happening when the last argument of
> > ovl_real_fdget_meta() is false, including the call from ovl_real_fdget().
> >

Correct. I had considered renaming the argument to allow_empty_upper_meta
but I don't think that will make the contract a lot better.
The thing is that ovl_fsync() caller is really different in two
different aspects:
1. It wants only upper and therefore fd_empty() is a possible outcome
2. It (may) want the metadata inode (when data is still in lower inode)

Overlayfs can have up to 3 different inodes in the stack for a regular file:
lower_data+lower_metadata+upper_metdata

There is currently no file operation that requires opening the lower_metadata
inode and therefore, staching one backing file in ->private_data and another
optional backing file chained from the first one is enough.

If there is ever a file operation that needs to open a realfile from
lower_metadata (only ioctl comes to mind), we may need to reevaluate.

> > I still don't like the calling conventions, TBH.  Let me think a bit...
>

I understand your concern, but honestly, I am not sure that returning
struct fderr is fundamentally different from checking IS_ERR_OR_NULL.

What I can do is refactor the helpers differently so that ovl_fsync() will
call ovl_file_upper() to clarify that it may return NULL, just like
ovl_{dentry,inode,path}_upper() and all the other callers will
call ovl_file_real() which cannot return NULL, because it returns
either lower or upper file, just like ovl_{inode,path}_real{,data}().

> Sorry, I'm afraid I'll have to leave that until tomorrow - over 38C after the sodding
> shingles shot really screws the ability to dig through the code ;-/  My apologies...

Ouch! feel well soon.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta()
  2024-10-04 22:23   ` Al Viro
@ 2024-10-05 12:37     ` Amir Goldstein
  0 siblings, 0 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-10-05 12:37 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Sat, Oct 5, 2024 at 12:23 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Oct 04, 2024 at 12:23:41PM +0200, Amir Goldstein wrote:
>
> >       if (upper_meta) {
> >               ovl_path_upper(dentry, &realpath);
> >               if (!realpath.dentry)
> > -                     return 0;
> > +                     return NULL;
> >       } else {
> >               /* lazy lookup and verify of lowerdata */
> >               err = ovl_verify_lowerdata(dentry);
> >               if (err)
> > -                     return err;
> > +                     return ERR_PTR(err);
>
> Ugh...  That kind of calling conventions is generally a bad idea.
>
> > +     return realfile;
>
> ... especially since it's NULL/ERR_PTR()/pointer to object.
>
>
> > +     realfile = ovl_real_file_meta(file, !datasync);
> > +     if (IS_ERR_OR_NULL(realfile))
> > +             return PTR_ERR(realfile);
>
> Please, don't.  IS_ERR_OR_NULL is bogus 9 times out of 10 (at least).

IDK, we have quite a few of these constants in ovl code and it's pretty
clear and useful to my taste, but I am open to being corrected.

Anyway, I pushed a new version to
https://github.com/amir73il/linux/commits/ovl_real_file-v2/

Where we have:
- ovl_dir_real_file() and ovl_upper_file() can return NULL and their
  few callers check for IS_ERR_OR_NULL()

> And you still have breakage in llseek et.al.

- ovl_real_file() and ovl_real_file_path() cannot return NULL and all
  their callers (llseek et.al.) check only for IS_ERR()

Let me know if you think this is still problematic.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ovl: do not open non-data lower file for fsync
  2024-10-05  6:30         ` Amir Goldstein
@ 2024-10-05 19:49           ` Al Viro
  2024-10-06  8:03             ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2024-10-05 19:49 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Sat, Oct 05, 2024 at 08:30:23AM +0200, Amir Goldstein wrote:

> I understand your concern, but honestly, I am not sure that returning
> struct fderr is fundamentally different from checking IS_ERR_OR_NULL.
> 
> What I can do is refactor the helpers differently so that ovl_fsync() will
> call ovl_file_upper() to clarify that it may return NULL, just like

ovl_dentry_upper(), you mean?

> ovl_{dentry,inode,path}_upper() and all the other callers will
> call ovl_file_real() which cannot return NULL, because it returns
> either lower or upper file, just like ovl_{inode,path}_real{,data}().

OK...  One thing I'm not happy about is the control (and data) flow in there:
stashed_upper:
        if (upperfile && file_inode(upperfile) == d_inode(realpath.dentry))
                realfile = upperfile;

        /*
         * If realfile is lower and has been copied up since we'd opened it,
         * open the real upper file and stash it in backing_file_private().
         */
        if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
                struct file *old;

                /* Stashed upperfile has a mismatched inode */
                if (unlikely(upperfile))
                        return ERR_PTR(-EIO);

                upperfile = ovl_open_realfile(file, &realpath);
                if (IS_ERR(upperfile))
                        return upperfile;

                old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
                                      upperfile);
                if (old) {
                        fput(upperfile);
                        upperfile = old;
                }

                goto stashed_upper;
        }
Unless I'm misreading that, the value of realfile seen inside the second
if is always the original; reassignment in the first if might as well had
been followed by goto past the second one.  What's more, if you win the
race in the second if, you'll have upperfile != NULL and its file_inode()
matching realpath.dentry->d_inode (you'd better, or you have a real problem
in backing_file_open()).  So that branch to stashed_upper in case old == NULL
might as well had been "realfile = upperfile;".  Correct?  In case old != NULL
we go there with upperfile != NULL.  If it (i.e. old) has the right file_inode(),
we are done; otherwise it's going to hit ERR_PTR(-EIO) in the second if.

So it seems to be equivalent to this:
        if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
	        /*
		 * If realfile is lower and has been copied up since we'd
		 * opened it, we need the real upper file opened.  Whoever gets
		 * there first stashes the result in backing_file_private().
		 */
		struct file *upperfile = backing_file_private(realfile);
		if (unlikely(!upperfile)) {
			struct file *old;

			upperfile = ovl_open_realfile(file, &realpath);
			if (IS_ERR(upperfile))
				return upperfile;

			old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
					      upperfile);
			if (old) {
				fput(upperfile);
				upperfile = old;
			}
		}
		// upperfile reference is owned by realfile at that point
		if (unlikely(file_inode(upperfile) != d_inode(realpath.dentry)))
			/* Stashed upperfile has a mismatched inode */
			return ERR_PTR(-EIO);
		realfile = upperfile;
	}
Or am I misreading it?  Seems to be more straightforward that way...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] ovl: do not open non-data lower file for fsync
  2024-10-05 19:49           ` Al Viro
@ 2024-10-06  8:03             ` Amir Goldstein
  0 siblings, 0 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-10-06  8:03 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Sat, Oct 5, 2024 at 9:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Oct 05, 2024 at 08:30:23AM +0200, Amir Goldstein wrote:
>
> > I understand your concern, but honestly, I am not sure that returning
> > struct fderr is fundamentally different from checking IS_ERR_OR_NULL.
> >
> > What I can do is refactor the helpers differently so that ovl_fsync() will
> > call ovl_file_upper() to clarify that it may return NULL, just like
>
> ovl_dentry_upper(), you mean?

No, I meant I created a new helper ovl_upper_file() that only ovl_fsync()
uses and can return NULL.
All the rest of the callers are using the helper ovl_real_file() which cannot
return NULL.
I will post it.

>
> > ovl_{dentry,inode,path}_upper() and all the other callers will
> > call ovl_file_real() which cannot return NULL, because it returns
> > either lower or upper file, just like ovl_{inode,path}_real{,data}().
>
> OK...  One thing I'm not happy about is the control (and data) flow in there:
> stashed_upper:
>         if (upperfile && file_inode(upperfile) == d_inode(realpath.dentry))
>                 realfile = upperfile;
>
>         /*
>          * If realfile is lower and has been copied up since we'd opened it,
>          * open the real upper file and stash it in backing_file_private().
>          */
>         if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
>                 struct file *old;
>
>                 /* Stashed upperfile has a mismatched inode */
>                 if (unlikely(upperfile))
>                         return ERR_PTR(-EIO);
>
>                 upperfile = ovl_open_realfile(file, &realpath);
>                 if (IS_ERR(upperfile))
>                         return upperfile;
>
>                 old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
>                                       upperfile);
>                 if (old) {
>                         fput(upperfile);
>                         upperfile = old;
>                 }
>
>                 goto stashed_upper;
>         }
> Unless I'm misreading that, the value of realfile seen inside the second
> if is always the original; reassignment in the first if might as well had
> been followed by goto past the second one.  What's more, if you win the
> race in the second if, you'll have upperfile != NULL and its file_inode()
> matching realpath.dentry->d_inode (you'd better, or you have a real problem
> in backing_file_open()).  So that branch to stashed_upper in case old == NULL
> might as well had been "realfile = upperfile;".  Correct?  In case old != NULL

Correct.

> we go there with upperfile != NULL.  If it (i.e. old) has the right file_inode(),
> we are done; otherwise it's going to hit ERR_PTR(-EIO) in the second if.
>
> So it seems to be equivalent to this:
>         if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
>                 /*
>                  * If realfile is lower and has been copied up since we'd
>                  * opened it, we need the real upper file opened.  Whoever gets
>                  * there first stashes the result in backing_file_private().
>                  */
>                 struct file *upperfile = backing_file_private(realfile);
>                 if (unlikely(!upperfile)) {
>                         struct file *old;
>
>                         upperfile = ovl_open_realfile(file, &realpath);
>                         if (IS_ERR(upperfile))
>                                 return upperfile;
>
>                         old = cmpxchg_release(backing_file_private_ptr(realfile), NULL,
>                                               upperfile);
>                         if (old) {
>                                 fput(upperfile);
>                                 upperfile = old;
>                         }
>                 }
>                 // upperfile reference is owned by realfile at that point
>                 if (unlikely(file_inode(upperfile) != d_inode(realpath.dentry)))
>                         /* Stashed upperfile has a mismatched inode */
>                         return ERR_PTR(-EIO);
>                 realfile = upperfile;
>         }
> Or am I misreading it?  Seems to be more straightforward that way...

Yeh, that's a bit more clear without to goto, but I would not remove
the if (upperfile) assertion. Actually the first if has a bug.
It assumes that if the upperfile is stashed then it must be used, but
this is incorrect.

I have made the following change:

static bool ovl_is_real_file(const struct file *realfile,
                             const struct path *realpath)
{
        return file_inode(realfile) == d_inode(realpath->dentry);
}
...
        /*
         * Usually, if we operated on a stashed upperfile once, all following
         * operations will operate on the stashed upperfile, but there is one
         * exception - ovl_fsync(datasync = false) can populate the stashed
         * upperfile to perform fsync on upper metadata inode.  In this case,
         * following read/write operations will not use the stashed upperfile.
         */
        if (upperfile && likely(ovl_is_real_file(upperfile, realpath))) {
                realfile = upperfile;
                goto checkflags;
        }

        /*
         * If realfile is lower and has been copied up since we'd opened it,
         * open the real upper file and stash it in backing_file_private().
         */
        if (unlikely(!ovl_is_real_file(realfile, realpath))) {
                struct file *old;

                /* Either stashed realfile or upperfile must match realinode */
                if (WARN_ON_ONCE(upperfile))
                        return ERR_PTR(-EIO);
...
                /* Stashed upperfile that won the race must match realinode */
                if (WARN_ON_ONCE(!ovl_is_real_file(upperfile, realpath)))
                        return ERR_PTR(-EIO);
               realfile = upperfile;
        }

checkflags:
...

What happens is that we have two slots for stashing real files
and there are subtle assumptions in the code that
1. We will never be requested to open a real file for more than
    two inodes (lower and upper)
2. If we are requested to open two inodes, we will always be
    requested to open the lower inode first
3. IOW if we are requested to open the upper inode first, we
    will not be requested to open a different inode

Therefore, the names realfile/upperfile are a bit misleading.
If file is opened after copyup, then the realfile in ->private_data
is the upper file and stashed upperfile is NULL.

I will post the revised version.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-06  8:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 10:23 [PATCH 0/4] Stash overlay real upper file in backing_file Amir Goldstein
2024-10-04 10:23 ` [PATCH 1/4] ovl: do not open non-data lower file for fsync Amir Goldstein
2024-10-04 22:16   ` Al Viro
2024-10-04 22:28     ` Al Viro
2024-10-05  1:35       ` Al Viro
2024-10-05  6:30         ` Amir Goldstein
2024-10-05 19:49           ` Al Viro
2024-10-06  8:03             ` Amir Goldstein
2024-10-04 10:23 ` [PATCH 2/4] ovl: stash upper real file in backing_file struct Amir Goldstein
2024-10-04 10:23 ` [PATCH 3/4] ovl: convert ovl_real_fdget_meta() callers to ovl_real_file_meta() Amir Goldstein
2024-10-04 22:23   ` Al Viro
2024-10-05 12:37     ` Amir Goldstein
2024-10-04 10:23 ` [PATCH 4/4] ovl: convert ovl_real_fdget() callers to ovl_real_file() Amir Goldstein
2024-10-04 22:25   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).