linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Store overlay real upper file in ovl_file
@ 2024-10-07 14:19 Amir Goldstein
  2024-10-07 14:19 ` [PATCH v3 1/5] ovl: do not open non-data lower file for fsync Amir Goldstein
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-10-07 14:19 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

Hi all,

This is v3 of the code to avoid temporary backing file opens in
overlayfs, taking into account Al's and Miklos' comments on v2 [1].

If no further comments, this is going for overlayfs-next.

Thanks,
Amir.

Changes since v2:
- Simplifications to flow (Al)
- Loose backing_file stash in favor of ovl_file (Miklos)

Changes since v1:
- Use helpers ovl_real_file() and ovl_upper_file() to express that
  ovl_real_file() cannot return NULL
- Fix readability and bug is code to select and store stashed upperfile

[1] https://lore.kernel.org/linux-unionfs/20241006082359.263755-1-amir73il@gmail.com/

Amir Goldstein (5):
  ovl: do not open non-data lower file for fsync
  ovl: allocate a container struct ovl_file for ovl private context
  ovl: store upper real file in ovl_file struct
  ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
  ovl: convert ovl_real_fdget() callers to ovl_real_file()

 fs/overlayfs/file.c | 301 +++++++++++++++++++++++++-------------------
 1 file changed, 173 insertions(+), 128 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/5] ovl: do not open non-data lower file for fsync
  2024-10-07 14:19 [PATCH v3 0/5] Store overlay real upper file in ovl_file Amir Goldstein
@ 2024-10-07 14:19 ` Amir Goldstein
  2024-10-07 14:39   ` Al Viro
  2024-10-07 16:55   ` Al Viro
  2024-10-07 14:19 ` [PATCH v3 2/5] ovl: allocate a container struct ovl_file for ovl private context Amir Goldstein
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-10-07 14:19 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

ovl_fsync() with !datasync opens a backing file from the top most dentry
in the stack, checks if this dentry is non-upper and skips the fsync.

In case of an overlay dentry stack with lower data and lower metadata
above it, but without an upper metadata above it, the backing file is
opened from the top most lower metadata dentry and never used.

Refactor the helper ovl_real_fdget_meta() into ovl_real_fdget_path() and
ovl_upper_fdget() where the latter returns an empty struct fd in that
case to avoid the unneeded backing file open.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 71 +++++++++++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4504493b20be..f5d0498355d0 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -89,32 +89,19 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 	return 0;
 }
 
-static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
-			       bool allow_meta)
+static int ovl_real_fdget_path(const struct file *file, struct fd *real,
+			       struct path *realpath)
 {
-	struct dentry *dentry = file_dentry(file);
 	struct file *realfile = file->private_data;
-	struct path realpath;
-	int err;
 
 	real->word = (unsigned long)realfile;
 
-	if (allow_meta) {
-		ovl_path_real(dentry, &realpath);
-	} else {
-		/* lazy lookup and verify of lowerdata */
-		err = ovl_verify_lowerdata(dentry);
-		if (err)
-			return err;
-
-		ovl_path_realdata(dentry, &realpath);
-	}
-	if (!realpath.dentry)
+	if (WARN_ON_ONCE(!realpath->dentry))
 		return -EIO;
 
 	/* Has it been copied up since we'd opened it? */
-	if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
-		struct file *f = ovl_open_realfile(file, &realpath);
+	if (unlikely(file_inode(realfile) != d_inode(realpath->dentry))) {
+		struct file *f = ovl_open_realfile(file, realpath);
 		if (IS_ERR(f))
 			return PTR_ERR(f);
 		real->word = (unsigned long)f | FDPUT_FPUT;
@@ -130,7 +117,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 
 static int ovl_real_fdget(const struct file *file, struct fd *real)
 {
-	if (d_is_dir(file_dentry(file))) {
+	struct dentry *dentry = file_dentry(file);
+	struct path realpath;
+	int err;
+
+	if (d_is_dir(dentry)) {
 		struct file *f = ovl_dir_real_file(file, false);
 		if (IS_ERR(f))
 			return PTR_ERR(f);
@@ -138,7 +129,33 @@ static int ovl_real_fdget(const struct file *file, struct fd *real)
 		return 0;
 	}
 
-	return ovl_real_fdget_meta(file, real, false);
+	/* lazy lookup and verify of lowerdata */
+	err = ovl_verify_lowerdata(dentry);
+	if (err)
+		return err;
+
+	ovl_path_realdata(dentry, &realpath);
+
+	return ovl_real_fdget_path(file, real, &realpath);
+}
+
+static int ovl_upper_fdget(const struct file *file, struct fd *real, bool data)
+{
+	struct dentry *dentry = file_dentry(file);
+	struct path realpath;
+	enum ovl_path_type type;
+
+	if (data)
+		type = ovl_path_realdata(dentry, &realpath);
+	else
+		type = ovl_path_real(dentry, &realpath);
+
+	real->word = 0;
+	/* Not interested in lower nor in upper meta if data was requested */
+	if (!OVL_TYPE_UPPER(type) || (data && OVL_TYPE_MERGE(type)))
+		return 0;
+
+	return ovl_real_fdget_path(file, real, &realpath);
 }
 
 static int ovl_open(struct inode *inode, struct file *file)
@@ -394,16 +411,14 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	if (ret <= 0)
 		return ret;
 
-	ret = ovl_real_fdget_meta(file, &real, !datasync);
-	if (ret)
+	/* Don't sync lower file for fear of receiving EROFS error */
+	ret = ovl_upper_fdget(file, &real, datasync);
+	if (ret || fd_empty(real))
 		return ret;
 
-	/* Don't sync lower file for fear of receiving EROFS error */
-	if (file_inode(fd_file(real)) == ovl_inode_upper(file_inode(file))) {
-		old_cred = ovl_override_creds(file_inode(file)->i_sb);
-		ret = vfs_fsync_range(fd_file(real), start, end, datasync);
-		revert_creds(old_cred);
-	}
+	old_cred = ovl_override_creds(file_inode(file)->i_sb);
+	ret = vfs_fsync_range(fd_file(real), start, end, datasync);
+	revert_creds(old_cred);
 
 	fdput(real);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/5] ovl: allocate a container struct ovl_file for ovl private context
  2024-10-07 14:19 [PATCH v3 0/5] Store overlay real upper file in ovl_file Amir Goldstein
  2024-10-07 14:19 ` [PATCH v3 1/5] ovl: do not open non-data lower file for fsync Amir Goldstein
@ 2024-10-07 14:19 ` Amir Goldstein
  2024-10-07 14:43   ` Al Viro
  2024-10-07 14:19 ` [PATCH v3 3/5] ovl: store upper real file in ovl_file struct Amir Goldstein
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-10-07 14:19 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

Instead of using ->private_data to point at realfile directly, so
that we can add more context per ovl open file.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index f5d0498355d0..03bf6037b129 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -89,10 +89,15 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 	return 0;
 }
 
+struct ovl_file {
+	struct file *realfile;
+};
+
 static int ovl_real_fdget_path(const struct file *file, struct fd *real,
 			       struct path *realpath)
 {
-	struct file *realfile = file->private_data;
+	struct ovl_file *of = file->private_data;
+	struct file *realfile = of->realfile;
 
 	real->word = (unsigned long)realfile;
 
@@ -163,6 +168,7 @@ static int ovl_open(struct inode *inode, struct file *file)
 	struct dentry *dentry = file_dentry(file);
 	struct file *realfile;
 	struct path realpath;
+	struct ovl_file *of;
 	int err;
 
 	/* lazy lookup and verify lowerdata */
@@ -181,18 +187,28 @@ static int ovl_open(struct inode *inode, struct file *file)
 	if (!realpath.dentry)
 		return -EIO;
 
+	of = kzalloc(sizeof(struct ovl_file), GFP_KERNEL);
+	if (!of)
+		return -ENOMEM;
+
 	realfile = ovl_open_realfile(file, &realpath);
-	if (IS_ERR(realfile))
+	if (IS_ERR(realfile)) {
+		kfree(of);
 		return PTR_ERR(realfile);
+	}
 
-	file->private_data = realfile;
+	of->realfile = realfile;
+	file->private_data = of;
 
 	return 0;
 }
 
 static int ovl_release(struct inode *inode, struct file *file)
 {
-	fput(file->private_data);
+	struct ovl_file *of = file->private_data;
+
+	fput(of->realfile);
+	kfree(of);
 
 	return 0;
 }
@@ -427,14 +443,14 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 
 static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	struct file *realfile = file->private_data;
+	struct ovl_file *of = file->private_data;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(file)->i_sb),
 		.user_file = file,
 		.accessed = ovl_file_accessed,
 	};
 
-	return backing_file_mmap(realfile, vma, &ctx);
+	return backing_file_mmap(of->realfile, vma, &ctx);
 }
 
 static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/5] ovl: store upper real file in ovl_file struct
  2024-10-07 14:19 [PATCH v3 0/5] Store overlay real upper file in ovl_file Amir Goldstein
  2024-10-07 14:19 ` [PATCH v3 1/5] ovl: do not open non-data lower file for fsync Amir Goldstein
  2024-10-07 14:19 ` [PATCH v3 2/5] ovl: allocate a container struct ovl_file for ovl private context Amir Goldstein
@ 2024-10-07 14:19 ` Amir Goldstein
  2024-10-07 14:19 ` [PATCH v3 4/5] ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path() Amir Goldstein
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-10-07 14:19 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

When an overlayfs file is opened as lower and then the file is copied up,
every operation on the overlayfs open file will open a temporary backing
file to the upper dentry and close it at the end of the operation.

Store the upper real file along side the original (lower) real file in
ovl_file instead of opening a temporary upper file on every operation.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 49 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 03bf6037b129..525bcddb49e5 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -91,32 +91,63 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 
 struct ovl_file {
 	struct file *realfile;
+	struct file *upperfile;
 };
 
+static bool ovl_is_real_file(const struct file *realfile,
+			     const struct path *realpath)
+{
+	return file_inode(realfile) == d_inode(realpath->dentry);
+}
+
 static int ovl_real_fdget_path(const struct file *file, struct fd *real,
 			       struct path *realpath)
 {
 	struct ovl_file *of = file->private_data;
 	struct file *realfile = of->realfile;
 
-	real->word = (unsigned long)realfile;
+	real->word = 0;
 
 	if (WARN_ON_ONCE(!realpath->dentry))
 		return -EIO;
 
-	/* Has it been copied up since we'd opened it? */
-	if (unlikely(file_inode(realfile) != d_inode(realpath->dentry))) {
-		struct file *f = ovl_open_realfile(file, realpath);
-		if (IS_ERR(f))
-			return PTR_ERR(f);
-		real->word = (unsigned long)f | FDPUT_FPUT;
-		return 0;
+	/*
+	 * If the realfile that we want is not where the data used to be at
+	 * open time, either we'd been copied up, or it's an fsync of a
+	 * metacopied file.  We need the upperfile either way, so see if it
+	 * is already opened and if it is not then open and store it.
+	 */
+	if (unlikely(!ovl_is_real_file(realfile, realpath))) {
+		struct file *upperfile = READ_ONCE(of->upperfile);
+		struct file *old;
+
+		if (!upperfile) { /* Nobody opened upperfile yet */
+			upperfile = ovl_open_realfile(file, realpath);
+			if (IS_ERR(upperfile))
+				return PTR_ERR(upperfile);
+
+			/* Store the upperfile for later */
+			old = cmpxchg_release(&of->upperfile, NULL, upperfile);
+			if (old) { /* Someone opened upperfile before us */
+				fput(upperfile);
+				upperfile = old;
+			}
+		}
+		/*
+		 * Stored file must be from the right inode, unless someone's
+		 * been corrupting the upper layer.
+		 */
+		if (WARN_ON_ONCE(!ovl_is_real_file(upperfile, realpath)))
+			return -EIO;
+
+		realfile = upperfile;
 	}
 
 	/* Did the flags change since open? */
 	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
 		return ovl_change_flags(realfile, file->f_flags);
 
+	real->word = (unsigned long)realfile;
 	return 0;
 }
 
@@ -208,6 +239,8 @@ static int ovl_release(struct inode *inode, struct file *file)
 	struct ovl_file *of = file->private_data;
 
 	fput(of->realfile);
+	if (of->upperfile)
+		fput(of->upperfile);
 	kfree(of);
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/5] ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
  2024-10-07 14:19 [PATCH v3 0/5] Store overlay real upper file in ovl_file Amir Goldstein
                   ` (2 preceding siblings ...)
  2024-10-07 14:19 ` [PATCH v3 3/5] ovl: store upper real file in ovl_file struct Amir Goldstein
@ 2024-10-07 14:19 ` Amir Goldstein
  2024-10-07 14:19 ` [PATCH v3 5/5] ovl: convert ovl_real_fdget() callers to ovl_real_file() Amir Goldstein
  2024-10-17  4:52 ` [PATCH v3 0/5] Store overlay real upper file in ovl_file Al Viro
  5 siblings, 0 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-10-07 14:19 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

Stop using struct fd to return a real file from ovl_real_fdget_path(),
because we no longer return a temporary file object and the callers
always get a borrowed file reference.

Rename the helper to ovl_real_file_path(), return a borrowed reference
of the real file that is referenced from the overlayfs file or an error.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 66 +++++++++++++++++++++++++--------------------
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 525bcddb49e5..b6d6ccc39dad 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -100,16 +100,14 @@ static bool ovl_is_real_file(const struct file *realfile,
 	return file_inode(realfile) == d_inode(realpath->dentry);
 }
 
-static int ovl_real_fdget_path(const struct file *file, struct fd *real,
-			       struct path *realpath)
+static struct file *ovl_real_file_path(const struct file *file,
+				       struct path *realpath)
 {
 	struct ovl_file *of = file->private_data;
 	struct file *realfile = of->realfile;
 
-	real->word = 0;
-
 	if (WARN_ON_ONCE(!realpath->dentry))
-		return -EIO;
+		return ERR_PTR(-EIO);
 
 	/*
 	 * If the realfile that we want is not where the data used to be at
@@ -124,7 +122,7 @@ static int ovl_real_fdget_path(const struct file *file, struct fd *real,
 		if (!upperfile) { /* Nobody opened upperfile yet */
 			upperfile = ovl_open_realfile(file, realpath);
 			if (IS_ERR(upperfile))
-				return PTR_ERR(upperfile);
+				return upperfile;
 
 			/* Store the upperfile for later */
 			old = cmpxchg_release(&of->upperfile, NULL, upperfile);
@@ -138,20 +136,23 @@ static int ovl_real_fdget_path(const struct file *file, struct fd *real,
 		 * been corrupting the upper layer.
 		 */
 		if (WARN_ON_ONCE(!ovl_is_real_file(upperfile, realpath)))
-			return -EIO;
+			return ERR_PTR(-EIO);
 
 		realfile = upperfile;
 	}
 
 	/* Did the flags change since open? */
-	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
-		return ovl_change_flags(realfile, file->f_flags);
+	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS)) {
+		int err = ovl_change_flags(realfile, file->f_flags);
 
-	real->word = (unsigned long)realfile;
-	return 0;
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	return realfile;
 }
 
-static int ovl_real_fdget(const struct file *file, struct fd *real)
+static struct file *ovl_real_file(const struct file *file)
 {
 	struct dentry *dentry = file_dentry(file);
 	struct path realpath;
@@ -159,23 +160,33 @@ static int ovl_real_fdget(const struct file *file, struct fd *real)
 
 	if (d_is_dir(dentry)) {
 		struct file *f = ovl_dir_real_file(file, false);
-		if (IS_ERR(f))
-			return PTR_ERR(f);
-		real->word = (unsigned long)f;
-		return 0;
+
+		if (WARN_ON_ONCE(!f))
+			return ERR_PTR(-EIO);
+		return f;
 	}
 
 	/* lazy lookup and verify of lowerdata */
 	err = ovl_verify_lowerdata(dentry);
 	if (err)
-		return err;
+		return ERR_PTR(err);
 
 	ovl_path_realdata(dentry, &realpath);
 
-	return ovl_real_fdget_path(file, real, &realpath);
+	return ovl_real_file_path(file, &realpath);
 }
 
-static int ovl_upper_fdget(const struct file *file, struct fd *real, bool data)
+static int ovl_real_fdget(const struct file *file, struct fd *real)
+{
+	struct file *f = ovl_real_file(file);
+
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+	real->word = (unsigned long)f;
+	return 0;
+}
+
+static struct file *ovl_upper_file(const struct file *file, bool data)
 {
 	struct dentry *dentry = file_dentry(file);
 	struct path realpath;
@@ -186,12 +197,11 @@ static int ovl_upper_fdget(const struct file *file, struct fd *real, bool data)
 	else
 		type = ovl_path_real(dentry, &realpath);
 
-	real->word = 0;
 	/* Not interested in lower nor in upper meta if data was requested */
 	if (!OVL_TYPE_UPPER(type) || (data && OVL_TYPE_MERGE(type)))
-		return 0;
+		return NULL;
 
-	return ovl_real_fdget_path(file, real, &realpath);
+	return ovl_real_file_path(file, &realpath);
 }
 
 static int ovl_open(struct inode *inode, struct file *file)
@@ -452,7 +462,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 {
-	struct fd real;
+	struct file *upperfile;
 	const struct cred *old_cred;
 	int ret;
 
@@ -461,16 +471,14 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 		return ret;
 
 	/* Don't sync lower file for fear of receiving EROFS error */
-	ret = ovl_upper_fdget(file, &real, datasync);
-	if (ret || fd_empty(real))
-		return ret;
+	upperfile = ovl_upper_file(file, datasync);
+	if (IS_ERR_OR_NULL(upperfile))
+		return PTR_ERR(upperfile);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fsync_range(fd_file(real), start, end, datasync);
+	ret = vfs_fsync_range(upperfile, start, end, datasync);
 	revert_creds(old_cred);
 
-	fdput(real);
-
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/5] ovl: convert ovl_real_fdget() callers to ovl_real_file()
  2024-10-07 14:19 [PATCH v3 0/5] Store overlay real upper file in ovl_file Amir Goldstein
                   ` (3 preceding siblings ...)
  2024-10-07 14:19 ` [PATCH v3 4/5] ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path() Amir Goldstein
@ 2024-10-07 14:19 ` Amir Goldstein
  2024-10-17  4:52 ` [PATCH v3 0/5] Store overlay real upper file in ovl_file Al Viro
  5 siblings, 0 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-10-07 14:19 UTC (permalink / raw)
  To: Miklos Szeredi, Al Viro; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

Stop using struct fd to return a real file from ovl_real_fdget(),
because we no longer return a temporary file object and the callers
always get a borrowed file reference.

Rename the helper to ovl_real_file(), return a borrowed reference of
the real file that is referenced from the overlayfs file or an error.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 145 ++++++++++++++++++--------------------------
 1 file changed, 59 insertions(+), 86 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index b6d6ccc39dad..b7bec2adb575 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -176,16 +176,6 @@ static struct file *ovl_real_file(const struct file *file)
 	return ovl_real_file_path(file, &realpath);
 }
 
-static int ovl_real_fdget(const struct file *file, struct fd *real)
-{
-	struct file *f = ovl_real_file(file);
-
-	if (IS_ERR(f))
-		return PTR_ERR(f);
-	real->word = (unsigned long)f;
-	return 0;
-}
-
 static struct file *ovl_upper_file(const struct file *file, bool data)
 {
 	struct dentry *dentry = file_dentry(file);
@@ -259,7 +249,7 @@ static int ovl_release(struct inode *inode, struct file *file)
 static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 {
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -275,9 +265,9 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 			return vfs_setpos(file, 0, 0);
 	}
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
 	/*
 	 * Overlay file f_pos is the master copy that is preserved
@@ -287,17 +277,15 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 	 * files, so we use the real file to perform seeks.
 	 */
 	ovl_inode_lock(inode);
-	fd_file(real)->f_pos = file->f_pos;
+	realfile->f_pos = file->f_pos;
 
 	old_cred = ovl_override_creds(inode->i_sb);
-	ret = vfs_llseek(fd_file(real), offset, whence);
+	ret = vfs_llseek(realfile, offset, whence);
 	revert_creds(old_cred);
 
-	file->f_pos = fd_file(real)->f_pos;
+	file->f_pos = realfile->f_pos;
 	ovl_inode_unlock(inode);
 
-	fdput(real);
-
 	return ret;
 }
 
@@ -338,8 +326,7 @@ static void ovl_file_accessed(struct file *file)
 static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
-	struct fd real;
-	ssize_t ret;
+	struct file *realfile;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(file)->i_sb),
 		.user_file = file,
@@ -349,22 +336,19 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (!iov_iter_count(iter))
 		return 0;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
-
-	ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
-				     &ctx);
-	fdput(real);
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
-	return ret;
+	return backing_file_read_iter(realfile, iter, iocb, iocb->ki_flags,
+				      &ctx);
 }
 
 static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct file *realfile;
 	ssize_t ret;
 	int ifl = iocb->ki_flags;
 	struct backing_file_ctx ctx = {
@@ -380,8 +364,9 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
+	realfile = ovl_real_file(file);
+	ret = PTR_ERR(realfile);
+	if (IS_ERR(realfile))
 		goto out_unlock;
 
 	if (!ovl_should_sync(OVL_FS(inode->i_sb)))
@@ -392,8 +377,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	 * this property in case it is set by the issuer.
 	 */
 	ifl &= ~IOCB_DIO_CALLER_COMP;
-	ret = backing_file_write_iter(fd_file(real), iter, iocb, ifl, &ctx);
-	fdput(real);
+	ret = backing_file_write_iter(realfile, iter, iocb, ifl, &ctx);
 
 out_unlock:
 	inode_unlock(inode);
@@ -405,28 +389,24 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 			       struct pipe_inode_info *pipe, size_t len,
 			       unsigned int flags)
 {
-	struct fd real;
-	ssize_t ret;
+	struct file *realfile;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(in)->i_sb),
 		.user_file = in,
 		.accessed = ovl_file_accessed,
 	};
 
-	ret = ovl_real_fdget(in, &real);
-	if (ret)
-		return ret;
-
-	ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
-	fdput(real);
+	realfile = ovl_real_file(in);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
-	return ret;
+	return backing_file_splice_read(realfile, ppos, pipe, len, flags, &ctx);
 }
 
 /*
  * Calling iter_file_splice_write() directly from overlay's f_op may deadlock
  * due to lock order inversion between pipe->mutex in iter_file_splice_write()
- * and file_start_write(fd_file(real)) in ovl_write_iter().
+ * and file_start_write(realfile) in ovl_write_iter().
  *
  * So do everything ovl_write_iter() does and call iter_file_splice_write() on
  * the real file.
@@ -434,7 +414,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 				loff_t *ppos, size_t len, unsigned int flags)
 {
-	struct fd real;
+	struct file *realfile;
 	struct inode *inode = file_inode(out);
 	ssize_t ret;
 	struct backing_file_ctx ctx = {
@@ -447,12 +427,12 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	ret = ovl_real_fdget(out, &real);
-	if (ret)
+	realfile = ovl_real_file(out);
+	ret = PTR_ERR(realfile);
+	if (IS_ERR(realfile))
 		goto out_unlock;
 
-	ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
-	fdput(real);
+	ret = backing_file_splice_write(pipe, realfile, ppos, len, flags, &ctx);
 
 out_unlock:
 	inode_unlock(inode);
@@ -497,7 +477,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	int ret;
 
@@ -508,19 +488,18 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	if (ret)
 		goto out_unlock;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
+	realfile = ovl_real_file(file);
+	ret = PTR_ERR(realfile);
+	if (IS_ERR(realfile))
 		goto out_unlock;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fallocate(fd_file(real), mode, offset, len);
+	ret = vfs_fallocate(realfile, mode, offset, len);
 	revert_creds(old_cred);
 
 	/* Update size */
 	ovl_file_modified(file);
 
-	fdput(real);
-
 out_unlock:
 	inode_unlock(inode);
 
@@ -529,20 +508,18 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 {
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
 	int ret;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fadvise(fd_file(real), offset, len, advice);
+	ret = vfs_fadvise(realfile, offset, len, advice);
 	revert_creds(old_cred);
 
-	fdput(real);
-
 	return ret;
 }
 
@@ -557,7 +534,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			    loff_t len, unsigned int flags, enum ovl_copyop op)
 {
 	struct inode *inode_out = file_inode(file_out);
-	struct fd real_in, real_out;
+	struct file *realfile_in, *realfile_out;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -570,31 +547,31 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			goto out_unlock;
 	}
 
-	ret = ovl_real_fdget(file_out, &real_out);
-	if (ret)
+	realfile_out = ovl_real_file(file_out);
+	ret = PTR_ERR(realfile_out);
+	if (IS_ERR(realfile_out))
 		goto out_unlock;
 
-	ret = ovl_real_fdget(file_in, &real_in);
-	if (ret) {
-		fdput(real_out);
+	realfile_in = ovl_real_file(file_in);
+	ret = PTR_ERR(realfile_in);
+	if (IS_ERR(realfile_in))
 		goto out_unlock;
-	}
 
 	old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
 	switch (op) {
 	case OVL_COPY:
-		ret = vfs_copy_file_range(fd_file(real_in), pos_in,
-					  fd_file(real_out), pos_out, len, flags);
+		ret = vfs_copy_file_range(realfile_in, pos_in,
+					  realfile_out, pos_out, len, flags);
 		break;
 
 	case OVL_CLONE:
-		ret = vfs_clone_file_range(fd_file(real_in), pos_in,
-					   fd_file(real_out), pos_out, len, flags);
+		ret = vfs_clone_file_range(realfile_in, pos_in,
+					   realfile_out, pos_out, len, flags);
 		break;
 
 	case OVL_DEDUPE:
-		ret = vfs_dedupe_file_range_one(fd_file(real_in), pos_in,
-						fd_file(real_out), pos_out, len,
+		ret = vfs_dedupe_file_range_one(realfile_in, pos_in,
+						realfile_out, pos_out, len,
 						flags);
 		break;
 	}
@@ -603,9 +580,6 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	/* Update size */
 	ovl_file_modified(file_out);
 
-	fdput(real_in);
-	fdput(real_out);
-
 out_unlock:
 	inode_unlock(inode_out);
 
@@ -649,20 +623,19 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
 
 static int ovl_flush(struct file *file, fl_owner_t id)
 {
-	struct fd real;
+	struct file *realfile;
 	const struct cred *old_cred;
-	int err;
+	int err = 0;
 
-	err = ovl_real_fdget(file, &real);
-	if (err)
-		return err;
+	realfile = ovl_real_file(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
 
-	if (fd_file(real)->f_op->flush) {
+	if (realfile->f_op->flush) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
-		err = fd_file(real)->f_op->flush(fd_file(real), id);
+		err = realfile->f_op->flush(realfile, id);
 		revert_creds(old_cred);
 	}
-	fdput(real);
 
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] ovl: do not open non-data lower file for fsync
  2024-10-07 14:19 ` [PATCH v3 1/5] ovl: do not open non-data lower file for fsync Amir Goldstein
@ 2024-10-07 14:39   ` Al Viro
  2024-10-07 15:56     ` Amir Goldstein
  2024-10-07 16:55   ` Al Viro
  1 sibling, 1 reply; 17+ messages in thread
From: Al Viro @ 2024-10-07 14:39 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, Oct 07, 2024 at 04:19:21PM +0200, Amir Goldstein wrote:
> +static int ovl_upper_fdget(const struct file *file, struct fd *real, bool data)
> +{
> +	struct dentry *dentry = file_dentry(file);
> +	struct path realpath;
> +	enum ovl_path_type type;
> +
> +	if (data)
> +		type = ovl_path_realdata(dentry, &realpath);
> +	else
> +		type = ovl_path_real(dentry, &realpath);
> +
> +	real->word = 0;
> +	/* Not interested in lower nor in upper meta if data was requested */
> +	if (!OVL_TYPE_UPPER(type) || (data && OVL_TYPE_MERGE(type)))
> +		return 0;
> +
> +	return ovl_real_fdget_path(file, real, &realpath);
>  }
>  
>  static int ovl_open(struct inode *inode, struct file *file)
> @@ -394,16 +411,14 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>  	if (ret <= 0)
>  		return ret;
>  
> -	ret = ovl_real_fdget_meta(file, &real, !datasync);
> -	if (ret)
> +	/* Don't sync lower file for fear of receiving EROFS error */
> +	ret = ovl_upper_fdget(file, &real, datasync);
> +	if (ret || fd_empty(real))
>  		return ret;

Is there any real point in keeping ovl_upper_fdget() separate from the
only caller?  Note that the checks for type make a lot more sense
in ovl_fsync() than buried in a separate helper and this fd_empty()
is a "do we have the wrong type?" check in disguise.

Why not just expand it at the call site?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] ovl: allocate a container struct ovl_file for ovl private context
  2024-10-07 14:19 ` [PATCH v3 2/5] ovl: allocate a container struct ovl_file for ovl private context Amir Goldstein
@ 2024-10-07 14:43   ` Al Viro
  2024-10-07 14:55     ` Miklos Szeredi
  0 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2024-10-07 14:43 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, Oct 07, 2024 at 04:19:22PM +0200, Amir Goldstein wrote:
> Instead of using ->private_data to point at realfile directly, so
> that we can add more context per ovl open file.

Hmm...  That'll cost you an extra deref to get to underlying file.
Cache footprint might get unpleasant...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] ovl: allocate a container struct ovl_file for ovl private context
  2024-10-07 14:43   ` Al Viro
@ 2024-10-07 14:55     ` Miklos Szeredi
  0 siblings, 0 replies; 17+ messages in thread
From: Miklos Szeredi @ 2024-10-07 14:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Amir Goldstein, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, 7 Oct 2024 at 16:43, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Oct 07, 2024 at 04:19:22PM +0200, Amir Goldstein wrote:
> > Instead of using ->private_data to point at realfile directly, so
> > that we can add more context per ovl open file.
>
> Hmm...  That'll cost you an extra deref to get to underlying file.
> Cache footprint might get unpleasant...

Yes, that's an extra deref.  But stacking is bound to greatly increase
the cache footprint anyway, so this doesn't look a serious worry to
me.  I don't think overlayfs is at the point where we are ready to
optimize at the cache access level.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] ovl: do not open non-data lower file for fsync
  2024-10-07 14:39   ` Al Viro
@ 2024-10-07 15:56     ` Amir Goldstein
  2024-10-07 17:04       ` Al Viro
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-10-07 15:56 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, Oct 7, 2024 at 4:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Oct 07, 2024 at 04:19:21PM +0200, Amir Goldstein wrote:
> > +static int ovl_upper_fdget(const struct file *file, struct fd *real, bool data)
> > +{
> > +     struct dentry *dentry = file_dentry(file);
> > +     struct path realpath;
> > +     enum ovl_path_type type;
> > +
> > +     if (data)
> > +             type = ovl_path_realdata(dentry, &realpath);
> > +     else
> > +             type = ovl_path_real(dentry, &realpath);
> > +
> > +     real->word = 0;
> > +     /* Not interested in lower nor in upper meta if data was requested */
> > +     if (!OVL_TYPE_UPPER(type) || (data && OVL_TYPE_MERGE(type)))
> > +             return 0;
> > +
> > +     return ovl_real_fdget_path(file, real, &realpath);
> >  }
> >
> >  static int ovl_open(struct inode *inode, struct file *file)
> > @@ -394,16 +411,14 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> >       if (ret <= 0)
> >               return ret;
> >
> > -     ret = ovl_real_fdget_meta(file, &real, !datasync);
> > -     if (ret)
> > +     /* Don't sync lower file for fear of receiving EROFS error */
> > +     ret = ovl_upper_fdget(file, &real, datasync);
> > +     if (ret || fd_empty(real))
> >               return ret;
>
> Is there any real point in keeping ovl_upper_fdget() separate from the
> only caller?  Note that the checks for type make a lot more sense
> in ovl_fsync() than buried in a separate helper and this fd_empty()
> is a "do we have the wrong type?" check in disguise.
>
> Why not just expand it at the call site?

You are right (again) I open code it, it looks much better:

        /* Don't sync lower file for fear of receiving EROFS error */
-       upperfile = ovl_upper_file(file, datasync);
-       if (IS_ERR_OR_NULL(upperfile))
+       type = ovl_path_type(dentry);
+       if (!OVL_TYPE_UPPER(type) || (datasync && OVL_TYPE_MERGE(type)))
+               return 0;
+
+       ovl_path_upper(dentry, &upperpath);
+       upperfile = ovl_real_file_path(file, &upperpath);
+       if (IS_ERR(upperfile))
                return PTR_ERR(upperfile);


Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] ovl: do not open non-data lower file for fsync
  2024-10-07 14:19 ` [PATCH v3 1/5] ovl: do not open non-data lower file for fsync Amir Goldstein
  2024-10-07 14:39   ` Al Viro
@ 2024-10-07 16:55   ` Al Viro
  2024-10-07 17:13     ` Al Viro
  1 sibling, 1 reply; 17+ messages in thread
From: Al Viro @ 2024-10-07 16:55 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, Oct 07, 2024 at 04:19:21PM +0200, Amir Goldstein wrote:
> +static int ovl_real_fdget_path(const struct file *file, struct fd *real,
> +			       struct path *realpath)

> -	if (allow_meta) {
> -		ovl_path_real(dentry, &realpath);
> -	} else {
> -		/* lazy lookup and verify of lowerdata */
> -		err = ovl_verify_lowerdata(dentry);
This check went
> -		if (err)
> -			return err;
> -
> -		ovl_path_realdata(dentry, &realpath);
> -	}

> @@ -138,7 +129,33 @@ static int ovl_real_fdget(const struct file *file, struct fd *real)
>  		return 0;
>  	}
>  
> -	return ovl_real_fdget_meta(file, real, false);
> +	/* lazy lookup and verify of lowerdata */
> +	err = ovl_verify_lowerdata(dentry);

... here

> +	if (err)
> +		return err;
> +
> +	ovl_path_realdata(dentry, &realpath);

> +static int ovl_upper_fdget(const struct file *file, struct fd *real, bool data)
> +{
> +	struct dentry *dentry = file_dentry(file);
> +	struct path realpath;
> +	enum ovl_path_type type;
> +
> +	if (data)
> +		type = ovl_path_realdata(dentry, &realpath);

... but not here.

I can see the point of not doing that in ->fsync() after we'd already
done ovl_verify_lowerdata() at open time, but what's different about
->read_iter() and friends that also come only after ->open()?
IOW, why is fdatasync() different from other data-access cases?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] ovl: do not open non-data lower file for fsync
  2024-10-07 15:56     ` Amir Goldstein
@ 2024-10-07 17:04       ` Al Viro
  0 siblings, 0 replies; 17+ messages in thread
From: Al Viro @ 2024-10-07 17:04 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, Oct 07, 2024 at 05:56:51PM +0200, Amir Goldstein wrote:

> You are right (again) I open code it, it looks much better:
> 
>         /* Don't sync lower file for fear of receiving EROFS error */
> -       upperfile = ovl_upper_file(file, datasync);
> -       if (IS_ERR_OR_NULL(upperfile))
> +       type = ovl_path_type(dentry);
> +       if (!OVL_TYPE_UPPER(type) || (datasync && OVL_TYPE_MERGE(type)))
> +               return 0;
> +
> +       ovl_path_upper(dentry, &upperpath);

OK, this answers my other question (re why skipping ovl_verify_lowerdata()
in ovl_upper_file() is OK); you don't want to mess with the lowerdata
anyway, so...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] ovl: do not open non-data lower file for fsync
  2024-10-07 16:55   ` Al Viro
@ 2024-10-07 17:13     ` Al Viro
  0 siblings, 0 replies; 17+ messages in thread
From: Al Viro @ 2024-10-07 17:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, Oct 07, 2024 at 05:55:40PM +0100, Al Viro wrote:

> > +static int ovl_upper_fdget(const struct file *file, struct fd *real, bool data)
> > +{
> > +	struct dentry *dentry = file_dentry(file);
> > +	struct path realpath;
> > +	enum ovl_path_type type;
> > +
> > +	if (data)
> > +		type = ovl_path_realdata(dentry, &realpath);
> 
> ... but not here.
> 
> I can see the point of not doing that in ->fsync() after we'd already
> done ovl_verify_lowerdata() at open time, but what's different about
> ->read_iter() and friends that also come only after ->open()?
> IOW, why is fdatasync() different from other data-access cases?

Nevermind that one - the answer is that ovl_path_realdata()
calls ovl_path_lowerdata() only in case when it sees
!OVL_TYPE_UPPER(type) || OVL_TYPE_MERGE(type), which guarantees
that the type check below that if (data) will fail anyway
(check being (!OVL_TYPE_UPPER(type) || (data && OVL_TYPE_MERGE(type))).

So this reduces the fdatasync case to
	if (!OVL_TYPE_UPPER(type) || OVL_TYPE_MERGE(type))
		fail;
	ovl_path_upper(dentry, &realpath);
just as the fsync case reduces to
	if (!OVL_TYPE_UPPER(type))
		fail;
	ovl_path_upper(dentry, &realpath);

making any lowerpath-related stuff irrelevant.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Store overlay real upper file in ovl_file
  2024-10-07 14:19 [PATCH v3 0/5] Store overlay real upper file in ovl_file Amir Goldstein
                   ` (4 preceding siblings ...)
  2024-10-07 14:19 ` [PATCH v3 5/5] ovl: convert ovl_real_fdget() callers to ovl_real_file() Amir Goldstein
@ 2024-10-17  4:52 ` Al Viro
  2024-10-17  8:18   ` Amir Goldstein
  5 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2024-10-17  4:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, linux-unionfs

On Mon, Oct 07, 2024 at 04:19:20PM +0200, Amir Goldstein wrote:
> Hi all,
> 
> This is v3 of the code to avoid temporary backing file opens in
> overlayfs, taking into account Al's and Miklos' comments on v2 [1].
> 
> If no further comments, this is going for overlayfs-next.

BTW, looking through the vicinity of that stuff:

int ovl_cleanup(struct ovl_fs *ofs, struct inode *wdir, struct dentry *wdentry)
{
        int err;

        dget(wdentry);
        if (d_is_dir(wdentry))
                err = ovl_do_rmdir(ofs, wdir, wdentry);
        else
                err = ovl_do_unlink(ofs, wdir, wdentry);
        dput(wdentry);

        if (err) {
                pr_err("cleanup of '%pd2' failed (%i)\n",
                       wdentry, err);
        }

        return err;
}

What the hell are those dget()/dput() doing there?  Not to mention an
access after dput(), both vfs_rmdir() and vfs_unlink() expect the
reference to dentry argument to be held by the caller and leave it
for the caller to dispose of.  What am I missing here?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Store overlay real upper file in ovl_file
  2024-10-17  4:52 ` [PATCH v3 0/5] Store overlay real upper file in ovl_file Al Viro
@ 2024-10-17  8:18   ` Amir Goldstein
  2024-10-17 13:52     ` Miklos Szeredi
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-10-17  8:18 UTC (permalink / raw)
  To: Al Viro, Miklos Szeredi; +Cc: Christian Brauner, linux-fsdevel, linux-unionfs

On Thu, Oct 17, 2024 at 6:52 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Oct 07, 2024 at 04:19:20PM +0200, Amir Goldstein wrote:
> > Hi all,
> >
> > This is v3 of the code to avoid temporary backing file opens in
> > overlayfs, taking into account Al's and Miklos' comments on v2 [1].
> >
> > If no further comments, this is going for overlayfs-next.
>
> BTW, looking through the vicinity of that stuff:
>
> int ovl_cleanup(struct ovl_fs *ofs, struct inode *wdir, struct dentry *wdentry)
> {
>         int err;
>
>         dget(wdentry);
>         if (d_is_dir(wdentry))
>                 err = ovl_do_rmdir(ofs, wdir, wdentry);
>         else
>                 err = ovl_do_unlink(ofs, wdir, wdentry);
>         dput(wdentry);
>
>         if (err) {
>                 pr_err("cleanup of '%pd2' failed (%i)\n",
>                        wdentry, err);
>         }
>
>         return err;
> }
>
> What the hell are those dget()/dput() doing there?  Not to mention an
> access after dput(), both vfs_rmdir() and vfs_unlink() expect the
> reference to dentry argument to be held by the caller and leave it
> for the caller to dispose of.  What am I missing here?

It has been like that since the first upstream version.
My guess is that it is an attempt to avoid turning wdentry
into a negative dentry, which is not expected to be useful in
ovl_clear_empty() situations, but this is just a guess.

Miklos?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Store overlay real upper file in ovl_file
  2024-10-17  8:18   ` Amir Goldstein
@ 2024-10-17 13:52     ` Miklos Szeredi
  2024-10-17 18:13       ` Al Viro
  0 siblings, 1 reply; 17+ messages in thread
From: Miklos Szeredi @ 2024-10-17 13:52 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Al Viro, Christian Brauner, linux-fsdevel, linux-unionfs

On Thu, 17 Oct 2024 at 10:18, Amir Goldstein <amir73il@gmail.com> wrote:

> It has been like that since the first upstream version.
> My guess is that it is an attempt to avoid turning wdentry
> into a negative dentry, which is not expected to be useful in
> ovl_clear_empty() situations, but this is just a guess.

Yes.  This was discussed in a private thread before merging overlayfs upstream.

Copying relevant parts here:

----------
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat, 18 Oct 2014 at 10:18
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
<mszeredi@suse.cz>, David Howells <dhowells@redhat.com>, Sage Weil
<sage@inktank.com>


[Cc to Sage due to interesting ceph bit that has shown up from grepping -
see the very end]

On Sat, Oct 18, 2014 at 06:01:53AM +0100, Al Viro wrote:

First of all, I've just fixed a dumb braino in ovl_clear_empty(); assignment
to upper needed to be moved up to the added test.  Force-pushed to the same
branch - vfs.git#ovl-experimental.

> As for the "what filesystems are we OK with", I wonder if looking into the
> sucker's ->s_d_op (or ->d_op of root of lower tree, for that matter) would
> be a good approximation.  I really think that ->d_{weak_,}revalidate() in
> there is complete no-go, ditto for ->d_manage() and ->d_automount() and
> I would consider ->d_compare() or ->d_hash() as a cause to be _very_ cautious.
>
> Alternatively, we could just go ahead and add FS_OK_FOR_OVERLAY_LOWER into
> fs type flags and mark the obvious good ones.  It's not _that_ much work.
>
> I'd still like to hear details on the plans re d_path(); I don't consider
> that a deal-breaker, but we'd better have some clear idea of what we are
> getting into.

BTW, why on the Earth are you pinning that ->__upperdentry twice?  The
comment about d_delete() makes no sense whatsoever - anything other than
overlayfs itself would have to grab a reference to call that d_delete(),
which would give you refcount greater than 1 automatically.  So it would
have to be overlayfs passing that thing to d_delete() or something that
would call it, right?  Now, d_delete() itself isn't called there at all.
Which leaves passing the sucker to something outside that would call
d_delete().  Now, what would it be?

Here's the full list of d_delete() callers:
arch/s390/hypfs/inode.c:82:     d_delete(dentry);
drivers/infiniband/hw/ipath/ipath_fs.c:318:     d_delete(dir);
drivers/infiniband/hw/qib/qib_fs.c:512: d_delete(dir);
drivers/usb/gadget/function/f_fs.c:1560:
d_delete(epfile->dentry);
drivers/usb/gadget/legacy/inode.c:1611:         d_delete (dentry);
fs/btrfs/ioctl.c:2507:          d_delete(dentry);
fs/ceph/dir.c:893:              d_delete(dentry);
fs/ceph/inode.c:1114:                           d_delete(dn);
fs/ceph/inode.c:1223:                           d_delete(dn);
fs/ceph/inode.c:1395:                   d_delete(dn);
fs/configfs/dir.c:643:          d_delete(child);
fs/configfs/dir.c:834:                  d_delete(dentry);
fs/configfs/dir.c:880:                  d_delete(dentry);
fs/configfs/dir.c:1475:                         d_delete(new_dentry);
fs/configfs/dir.c:1721: d_delete(dentry);
fs/debugfs/inode.c:483:                         d_delete(dentry);
fs/devpts/inode.c:666:  d_delete(dentry);
fs/efivarfs/file.c:50:          d_delete(file->f_dentry);
fs/fuse/dir.c:1061:                     d_delete(entry);
fs/namei.c:3586:                d_delete(dentry);
fs/namei.c:3702:                d_delete(dentry);
fs/nfs/dir.c:1760:              d_delete(dentry);
fs/nfs/nfs4proc.c:2231:                         d_delete(dentry);
fs/ocfs2/dlmglue.c:3752:                d_delete(dentry);
fs/reiserfs/xattr.c:95:         d_delete(dentry);
fs/reiserfs/xattr.c:111:                d_delete(dentry);
net/sunrpc/rpc_pipe.c:607:      d_delete(dentry);
net/sunrpc/rpc_pipe.c:634:      d_delete(dentry);
security/selinux/selinuxfs.c:1212:                      d_delete(d);

We are talking about the *upper* layer; that excludes most of those
guys.  At the very least, you want that fs to support rename and xattrs.
So hypfs, infinibarf ones, gadgetfs, configfs, debugfs, devpts, efivarfs,
sunrpc and selinuxfs are right out.  Moreover, all of those are not in
the codepaths reachable from overlayfs - all of that is removal of
object driven by external event.  And we end up using a reference other
than what overlayfs would be holding.  The same goes for reiserfs
xattr code (it calls d_delete() for references it has acquired itself)
and for ocfs2.  NFS is also not an option for upper layer, according to
overlayfs docs.  FUSE is in the same boat as ocfs2 and reiserfs - we acquire
the reference by d_lookup() in the same function.  The same goes for
btrfs caller (s/d_lookup/lookup_one_len/), not to mention that this code
won't be called by overlayfs.  What's left?

fs/ceph/dir.c:893:              d_delete(dentry);
ceph_unlink().

fs/ceph/inode.c:1114:                           d_delete(dn);
ceph_fill_trace(), dn comes from d_lookup().  Not an issue.

fs/ceph/inode.c:1395:                   d_delete(dn);
ceph_readdir_prepopulate(), dn comes from d_lookup().  Not an issue.

fs/ceph/inode.c:1223:                           d_delete(dn);
ceph_fill_trace(), again.  Hell knows - it's really hard to read ;-/

fs/namei.c:3586:                d_delete(dentry);
vfs_rmdir()

fs/namei.c:3702:                d_delete(dentry);
vfs_unlink()

Now, ceph_unlink() can come only from vfs_unlink().  So we are down to
the following: victim of vfs_unlink(), victim of vfs_rmdir(), _maybe_
something strange coming from that ceph_fill_trace() callsite.

We definitely do have vfs_unlink() and vfs_rmdir() calls in overlayfs.
Not many, though - there's ovl_remove_upper(), calling them directly,
and there's ovl_cleanup(), calling them via ovl_do_{unlink,rmdir}()
wrappers.  For one thing, we could do dget()/dput() in both of those guys.
However, looking at the callers of ovl_cleanup() shows that with two
exceptions we have already grabbed/dropped dentry around the callsite.
Exceptions are ovl_clear_empty() and ovl_remove_and_whiteout().
Could as well put that dget()/dput() in those two, rather than in
ovl_clear_empty()...

IOW, modulo that ceph thing we could trivially avoid that double reference
to ->__upperdentry, just by doing a temporary dget()/dput() in a few
places in fs/overlayfs/dir.c.  Objections?  A bunch of code becomes simpler
that way, IMO...

Question to Sage: what's that d_delete() in ceph_fill_trace() about?
It's this bit:
                /* null dentry? */
                if (!rinfo->head->is_target) {
                        dout("fill_trace null dentry\n");
                        if (dn->d_inode) {
                                dout("d_delete %p\n", dn);
                                d_delete(dn);
                        } else {
                                dout("d_instantiate %p NULL\n", dn);
                                d_instantiate(dn, NULL);
                                if (have_lease && d_unhashed(dn))
                                        d_rehash(dn);
                                update_dentry_lease(dn, rinfo->dlease,
                                                    session,
                                                    req->r_request_started);
                        }
                        goto done;
                }
What codepaths could lead us there and where could that dentry have come
from?  Overlayfs aside, the things can get rather interesting if it could,
e.g. turn out to be an existing mountpoint...


----------
From: David Howells <dhowells@redhat.com>
Date: Sun, 19 Oct 2014 at 11:32
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: <dhowells@redhat.com>, Miklos Szeredi <miklos@szeredi.hu>, Linus
Torvalds <torvalds@linux-foundation.org>, <mszeredi@suse.cz>


Al Viro <viro@ZenIV.linux.org.uk> wrote:

> David, could you give a braindump on selinux issues?  I hadn't watched your
> conversation with Ian closely, so I'd rather avoid doing second-hand
> retelling...

The problem boils down to this: When they fall through from the top layer to a
lower source layer, Overlayfs and unionmount both set up struct file to point
directly to the file in the lower layer.  This is then passed to various
security_xxx() functions.

For labelled-inode-based LSM's this means they see the label on the lower
inode, not the label on the overlay/union inode - indeed in unionmount, there
*is* no upper/union inode.

Overlayfs is more complicated than unionmount in that there are three layers
and it falls through from the overlay layer to the upper layer (ie. the change
stash) too.  In this case also the overlay inode is unavailable to the LSM.

Unionmount mitigates the lower-layer label problem by pointing file->f_path at
the union dentry and file->f_inode at the lower inode (and
file->f_path->dentry->d_fallthru at the lower dentry).

Further, unionmount has no upper layer problem since the changes are stored in
the union layer itself.


Having discussed this with various people in the context of docker, a tentative
consensus has been reached:

 (1) The docker source tree (ie. the lower layer) will all be under a single
     label.

 (2) The docker root (ie. the overlay/union layer) will all be under a single,
     but different label set on the overlay mount (and each docker root may be
     under its own label).

 (3) Inodes in the overlayfs upper layer will be given the overlay label.

 (4) A security_copy_up() operation will be provided to set the label on the
     upper inode when it is created.

 (5) A security_copy_up_xattr() operation will be provided to vet (and maybe
     modify) each xattr as it is copied up.

 (6) An extra label slot will be placed in struct file_security_struct to hold
     the overlay label.

 (7) security_file_open() will need to be given both the overlay and lower
     dentries.

     For overlayfs, the way this probably should be done is file->f_path should
     be set to point to the overlay dentry (thus getting /proc right) and
     file->f_inode to the lower file and make use of d_fallthru in the overlay
     dentry in common with unionmount.

 (8) When the lower file is accessed, both the lower and overlay labels should
     be checked and audited.

 (9) When the upper file is accessed, only the overlay label needs to be
     checked and audited.

David


----------
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Mon, 20 Oct 2014 at 17:47
To: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>, Linus Torvalds
<torvalds@linux-foundation.org>, <mszeredi@suse.cz>


On Sun, Oct 19, 2014 at 10:31:52AM +0100, David Howells wrote:

>      For overlayfs, the way this probably should be done is file->f_path should
>      be set to point to the overlay dentry (thus getting /proc right) and
>      file->f_inode to the lower file and make use of d_fallthru in the overlay
>      dentry in common with unionmount.

To elaborate a bit: a _lot_ of places in filesystems that used to use
->f_path.dentry->d_inode had been eliminated in favour of file_inode(...)
and all the remaining ones ought to follow.  With that done (I was actually
planning to do whack-a-mole session on those guys after most of this cycle
merges would be done - Linus, would you accept that in -rc2?) we get
surprisingly few places that even look at ->f_path.dentry.

Some of those are refering to ->d_name; we need to review those for other
reasons (potential rename() races), but for unionmount/overlayfs purposes
we couldn't care less which of dentries is used - both overlayfs and
underlying fs dentry have the same name.  FWIW, a bunch of uses are in
printks, and those should become %pD...

A bunch of places uses ->f_path.dentry->d_sb to get the superblock by
file; file_inode()->i_sb would do just fine in filesystems.  And places
like that *outside* of filesystems need a bit of review - the question is
which superblock do we want?  That of overlayfs or that of the layer?
The latter would be file_inode()->i_sb, again.  The former would be a problem
with overlayfs in its current form; with leaving f_path to point to overlayfs
it would work fine.

dir_emit_dot() and dir_emit_dotdot() use ->f_path.dentry, but those are not
problem - overlayfs explicitly opens directory in a layer.

AFAICS, nothing of what remains is on paths hot enough to really care about
an extra dereference.  So I think that after the dust from cleanups settles,
we'll be able to add an inlined helper usable for accesses to file's dentries,
and ban open-coded ->f_path.dentry in filesystems.


----------
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Wed, 22 Oct 2014 at 05:36
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, <mszeredi@suse.cz>


On Fri, Oct 17, 2014 at 05:30:52PM +0200, Miklos Szeredi wrote:

> Will do patches ASAP (which probably means next week) for all but having proper
> d_path() on the leaves.

Ping?  d_path() will obviously have to wait (my preference would be to leave
proper overlayfs dentry in ->f_path.dentry, set ->f_inode to one from the
layer and make sure that all filesystem code will DTRT; we are *very* close
to that already), but that's
        a) doable after the thing went in (it's not that much rewrite and
VFS-side it will probably just mean death to ->dentry_open() - sure, it's
not nice to put the method in just for one cycle, but it's not particulary
tragic, especially if it's clearly marked as "it's going away very soon,
do not rely on it") and
        b) is next cycle fodder

The same goes for the weirdness with double dget() on __upperdentry - it's
absolutely self-contained and we can deal with it later.  I would obviously
prefer fewer odd warts when it goes in, but this one isn't a bug per se.

rmdir() failure, OTOH, is one.  So's the memory footprint of cached
union of directories, seeing that every struct file over a directory
gets a copy of its own.  So's accepting layers with non-trivial ->d_revalidate,
->d_compare, ->d_hash, ->d_automount and ->d_manage.

The interim branch is in vfs.git#ovl-experimental; do you want me to post it
as-is same way you posted previous iterations?

Another thing: one more mail in that thread had bounced from google
mailsewers.  This time it claimed that delivery to Linus has failed.
I've no idea if they report all addresses; I've put the bounce message on
ftp.linux.org.uk/pub/viro/bounced2.  I really wonder what's going on
with those filters...


----------
From: Miklos Szeredi <miklos@szeredi.hu>
Date: Wed, 22 Oct 2014 at 10:12
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, <mszeredi@suse.cz>


On Wed, Oct 22, 2014 at 04:35:58AM +0100, Al Viro wrote:
> On Fri, Oct 17, 2014 at 05:30:52PM +0200, Miklos Szeredi wrote:
>
> > Will do patches ASAP (which probably means next week) for all but having proper
> > d_path() on the leaves.
>
> Ping?  d_path() will obviously have to wait (my preference would be to leave
> proper overlayfs dentry in ->f_path.dentry, set ->f_inode to one from the
> layer and make sure that all filesystem code will DTRT; we are *very* close
> to that already), but that's
>       a) doable after the thing went in (it's not that much rewrite and
> VFS-side it will probably just mean death to ->dentry_open() - sure, it's
> not nice to put the method in just for one cycle, but it's not particulary
> tragic, especially if it's clearly marked as "it's going away very soon,
> do not rely on it") and
>       b) is next cycle fodder
>
> The same goes for the weirdness with double dget() on __upperdentry - it's
> absolutely self-contained and we can deal with it later.

Done, but not terribly urgent for me either.

>  I would obviously
> prefer fewer odd warts when it goes in, but this one isn't a bug per se.
>
> rmdir() failure, OTOH, is one.

I looked at this, and found no bug: it does raise CAP_DAC_OVERRIDE in both
callers (ovl_do_remove() and ovl_rename2()).

>  So's the memory footprint of cached
> union of directories, seeing that every struct file over a directory
> gets a copy of its own.

Done, at the cost of 100 or so extra lines *and* is still DoS-able if the
directory is changed between reading it from the different file instances.

I've been poring over this last night, and the proper solution would be to
update the cache as it changes, so we have only one cache.  Not hard to do
conceptually, but not a small change either.

I'm wondering if it's OK to get i_mutex in ->release().  It would simplify the
locking...

>  So's accepting layers with non-trivial ->d_revalidate,
> ->d_compare, ->d_hash, ->d_automount and ->d_manage.

Done.

Also moved notify_change() into copy-up for setattr, so the attribute update is
atomic.

And a few other cleanups and fixes.

Updated the overlayfs.current branch here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git
overlayfs.current

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Store overlay real upper file in ovl_file
  2024-10-17 13:52     ` Miklos Szeredi
@ 2024-10-17 18:13       ` Al Viro
  0 siblings, 0 replies; 17+ messages in thread
From: Al Viro @ 2024-10-17 18:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Amir Goldstein, Christian Brauner, linux-fsdevel, linux-unionfs

On Thu, Oct 17, 2024 at 03:52:10PM +0200, Miklos Szeredi wrote:
> On Thu, 17 Oct 2024 at 10:18, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> > It has been like that since the first upstream version.
> > My guess is that it is an attempt to avoid turning wdentry
> > into a negative dentry, which is not expected to be useful in
> > ovl_clear_empty() situations, but this is just a guess.
> 
> Yes.  This was discussed in a private thread before merging overlayfs upstream.

10 years ago ;-/  Sorry, memories swapped out...

|> BTW, why on the Earth are you pinning that ->__upperdentry twice?  The
|> comment about d_delete() makes no sense whatsoever - anything other than
|> overlayfs itself would have to grab a reference to call that d_delete(),

[snip]

Out of curiosity, which callers pass anyone's upperdentry to ovl_cleanup()?
Note that if we get dentry reference not from ovl_dentry_upper() et.al.,
we'll get that protection for free.  From quick look it seems that the only
such callchain is from ovl_clear_empty()...

Oh, well...  Shifting it over there would be asking for trouble later on.
Might be worth a comment in ovl_cleanup() - it is subtle enough to be
confusing for readers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-10-17 18:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07 14:19 [PATCH v3 0/5] Store overlay real upper file in ovl_file Amir Goldstein
2024-10-07 14:19 ` [PATCH v3 1/5] ovl: do not open non-data lower file for fsync Amir Goldstein
2024-10-07 14:39   ` Al Viro
2024-10-07 15:56     ` Amir Goldstein
2024-10-07 17:04       ` Al Viro
2024-10-07 16:55   ` Al Viro
2024-10-07 17:13     ` Al Viro
2024-10-07 14:19 ` [PATCH v3 2/5] ovl: allocate a container struct ovl_file for ovl private context Amir Goldstein
2024-10-07 14:43   ` Al Viro
2024-10-07 14:55     ` Miklos Szeredi
2024-10-07 14:19 ` [PATCH v3 3/5] ovl: store upper real file in ovl_file struct Amir Goldstein
2024-10-07 14:19 ` [PATCH v3 4/5] ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path() Amir Goldstein
2024-10-07 14:19 ` [PATCH v3 5/5] ovl: convert ovl_real_fdget() callers to ovl_real_file() Amir Goldstein
2024-10-17  4:52 ` [PATCH v3 0/5] Store overlay real upper file in ovl_file Al Viro
2024-10-17  8:18   ` Amir Goldstein
2024-10-17 13:52     ` Miklos Szeredi
2024-10-17 18:13       ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).