public inbox for linux-unionfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
@ 2024-07-22 10:14 Fei Lv
  2024-07-22 13:56 ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Fei Lv @ 2024-07-22 10:14 UTC (permalink / raw)
  To: miklos, amir73il, linux-unionfs, linux-kernel, lianghuxu, feilv

For upper filesystem which does not enforce ordering on storing of
metadata changes(e.g. ubifs), when overlayfs file is modified for
the first time, copy up will create a copy of the lower file and
its parent directories in the upper layer. Permission lost of the
new upper parent directory was observed during power-cut stress test.

Fix by adding new mount opion "fsync=strict", make sure data/metadata of
copied up directory written to disk before renaming from tmp to final
destination.

Signed-off-by: Fei Lv <feilv@asrmicro.com>
---
V1 -> V2:
 1. change open flags from "O_LARGEFILE | O_WRONLY" to "O_RDONLY".
 2. change mount option to "fsync=ordered/strict/volatile".
 3. ovl_should_sync_strict() implies ovl_should_sync().
 4. remove redundant ovl_should_sync_strict from ovl_copy_up_meta_inode_data.
 5. update commit log.
 6. update documentation overlayfs.rst.

 Documentation/filesystems/overlayfs.rst | 39 +++++++++++++++++++++++++
 fs/overlayfs/copy_up.c                  | 18 ++++++++++++
 fs/overlayfs/ovl_entry.h                | 20 +++++++++++--
 fs/overlayfs/params.c                   | 33 ++++++++++++++++++---
 fs/overlayfs/super.c                    |  2 +-
 5 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
index 165514401441..a783e57bdb57 100644
--- a/Documentation/filesystems/overlayfs.rst
+++ b/Documentation/filesystems/overlayfs.rst
@@ -742,6 +742,45 @@ controlled by the "uuid" mount option, which supports these values:
     mounted with "uuid=on".
 
 
+Durability and copy up
+----------------------
+
+The fsync(2) and fdatasync(2) system calls ensure that the metadata and
+data of a file, respectively, are safely written to the backing
+storage, which is expected to guarantee the existence of the information post
+system crash.
+
+Without the fdatasync(2) call, there is no guarantee that the observed
+data after a system crash will be either the old or the new data, but
+in practice, the observed data after crash is often the old or new data or a
+mix of both.
+
+When overlayfs file is modified for the first time, copy up will create
+a copy of the lower file and its parent directories in the upper layer.
+In case of a system crash, if fdatasync(2) was not called after the
+modification, the upper file could end up with no data at all (i.e.
+zeros), which would be an unusual outcome.  To avoid this experience,
+overlayfs calls fsync(2) on the upper file before completing the copy up with
+rename(2) to make the copy up "atomic".
+
+Depending on the backing filesystem (e.g. ubifs), fsync(2) before
+rename(2) may not be enough to provide the "atomic" copy up behavior
+and fsync(2) on the copied up parent directories is required as well.
+
+Overlayfs can be tuned to prefer performance or durability when storing
+to the underlying upper layer.  This is controlled by the "fsync" mount
+option, which supports these values:
+
+- "ordered": (default)
+    Call fsync(2) on upper file before completion of copy up.
+- "strict":
+    Call fsync(2) on upper file and directories before completion of copy up.
+- "volatile": [*]
+    Prefer performance over durability (see `Volatile mount`_)
+
+[*] The mount option "volatile" is an alias to "fsync=volatile".
+
+
 Volatile mount
 --------------
 
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index a5ef2005a2cc..d99a18afceb8 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -243,6 +243,21 @@ static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
 	return 0;
 }
 
+static int ovl_copy_up_sync(struct path *path)
+{
+	struct file *new_file;
+	int err;
+
+	new_file = ovl_path_open(path, O_RDONLY);
+	if (IS_ERR(new_file))
+		return PTR_ERR(new_file);
+
+	err = vfs_fsync(new_file, 0);
+	fput(new_file);
+
+	return err;
+}
+
 static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
 			    struct file *new_file, loff_t len)
 {
@@ -701,6 +716,9 @@ static int ovl_copy_up_metadata(struct ovl_copy_up_ctx *c, struct dentry *temp)
 		err = ovl_set_attr(ofs, temp, &c->stat);
 	inode_unlock(temp->d_inode);
 
+	if (!err && ovl_should_sync_strict(ofs))
+		err = ovl_copy_up_sync(&upperpath);
+
 	return err;
 }
 
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index cb449ab310a7..7f6d2effd5f1 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -5,6 +5,12 @@
  * Copyright (C) 2016 Red Hat, Inc.
  */
 
+enum {
+	OVL_FSYNC_ORDERED,
+	OVL_FSYNC_STRICT,
+	OVL_FSYNC_VOLATILE,
+};
+
 struct ovl_config {
 	char *upperdir;
 	char *workdir;
@@ -18,7 +24,7 @@ struct ovl_config {
 	int xino;
 	bool metacopy;
 	bool userxattr;
-	bool ovl_volatile;
+	int fsync_mode;
 };
 
 struct ovl_sb {
@@ -120,7 +126,17 @@ static inline struct ovl_fs *OVL_FS(struct super_block *sb)
 
 static inline bool ovl_should_sync(struct ovl_fs *ofs)
 {
-	return !ofs->config.ovl_volatile;
+	return ofs->config.fsync_mode != OVL_FSYNC_VOLATILE;
+}
+
+static inline bool ovl_should_sync_strict(struct ovl_fs *ofs)
+{
+	return ofs->config.fsync_mode == OVL_FSYNC_STRICT;
+}
+
+static inline bool ovl_is_volatile(struct ovl_config *config)
+{
+	return config->fsync_mode == OVL_FSYNC_VOLATILE;
 }
 
 static inline unsigned int ovl_numlower(struct ovl_entry *oe)
diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
index 4860fcc4611b..c4aac288b7e0 100644
--- a/fs/overlayfs/params.c
+++ b/fs/overlayfs/params.c
@@ -58,6 +58,7 @@ enum ovl_opt {
 	Opt_xino,
 	Opt_metacopy,
 	Opt_verity,
+	Opt_fsync,
 	Opt_volatile,
 };
 
@@ -139,6 +140,23 @@ static int ovl_verity_mode_def(void)
 	return OVL_VERITY_OFF;
 }
 
+static const struct constant_table ovl_parameter_fsync[] = {
+	{ "ordered",	OVL_FSYNC_ORDERED  },
+	{ "strict",	OVL_FSYNC_STRICT   },
+	{ "volatile",	OVL_FSYNC_VOLATILE },
+	{}
+};
+
+static const char *ovl_fsync_mode(struct ovl_config *config)
+{
+	return ovl_parameter_fsync[config->fsync_mode].name;
+}
+
+static int ovl_fsync_mode_def(void)
+{
+	return OVL_FSYNC_ORDERED;
+}
+
 const struct fs_parameter_spec ovl_parameter_spec[] = {
 	fsparam_string_empty("lowerdir",    Opt_lowerdir),
 	fsparam_string("lowerdir+",         Opt_lowerdir_add),
@@ -154,6 +172,7 @@ const struct fs_parameter_spec ovl_parameter_spec[] = {
 	fsparam_enum("xino",                Opt_xino, ovl_parameter_xino),
 	fsparam_enum("metacopy",            Opt_metacopy, ovl_parameter_bool),
 	fsparam_enum("verity",              Opt_verity, ovl_parameter_verity),
+	fsparam_enum("fsync",               Opt_fsync, ovl_parameter_fsync),
 	fsparam_flag("volatile",            Opt_volatile),
 	{}
 };
@@ -617,8 +636,11 @@ static int ovl_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case Opt_verity:
 		config->verity_mode = result.uint_32;
 		break;
+	case Opt_fsync:
+		config->fsync_mode = result.uint_32;
+		break;
 	case Opt_volatile:
-		config->ovl_volatile = true;
+		config->fsync_mode = OVL_FSYNC_VOLATILE;
 		break;
 	case Opt_userxattr:
 		config->userxattr = true;
@@ -802,9 +824,9 @@ int ovl_fs_params_verify(const struct ovl_fs_context *ctx,
 		config->index = false;
 	}
 
-	if (!config->upperdir && config->ovl_volatile) {
+	if (!config->upperdir && ovl_is_volatile(config)) {
 		pr_info("option \"volatile\" is meaningless in a non-upper mount, ignoring it.\n");
-		config->ovl_volatile = false;
+		config->fsync_mode = ovl_fsync_mode_def();
 	}
 
 	if (!config->upperdir && config->uuid == OVL_UUID_ON) {
@@ -997,8 +1019,11 @@ int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	if (ofs->config.metacopy != ovl_metacopy_def)
 		seq_printf(m, ",metacopy=%s",
 			   ofs->config.metacopy ? "on" : "off");
-	if (ofs->config.ovl_volatile)
+	if (ovl_is_volatile(&ofs->config))
 		seq_puts(m, ",volatile");
+	else if (ofs->config.fsync_mode != ovl_fsync_mode_def())
+		seq_printf(m, ",fsync=%s",
+			   ovl_fsync_mode(&ofs->config));
 	if (ofs->config.userxattr)
 		seq_puts(m, ",userxattr");
 	if (ofs->config.verity_mode != ovl_verity_mode_def())
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 06a231970cb5..824cbcf40523 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -750,7 +750,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
 	 * For volatile mount, create a incompat/volatile/dirty file to keep
 	 * track of it.
 	 */
-	if (ofs->config.ovl_volatile) {
+	if (ovl_is_volatile(&ofs->config)) {
 		err = ovl_create_volatile_dirty(ofs);
 		if (err < 0) {
 			pr_err("Failed to create volatile/dirty file.\n");

base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-07-22 10:14 [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict" Fei Lv
@ 2024-07-22 13:56 ` Amir Goldstein
  2024-08-23  9:51   ` Amir Goldstein
  2024-08-26 15:59   ` Miklos Szeredi
  0 siblings, 2 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-07-22 13:56 UTC (permalink / raw)
  To: Fei Lv; +Cc: miklos, linux-unionfs, linux-kernel, lianghuxu

On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
>
> For upper filesystem which does not enforce ordering on storing of
> metadata changes(e.g. ubifs), when overlayfs file is modified for
> the first time, copy up will create a copy of the lower file and
> its parent directories in the upper layer. Permission lost of the
> new upper parent directory was observed during power-cut stress test.
>
> Fix by adding new mount opion "fsync=strict", make sure data/metadata of
> copied up directory written to disk before renaming from tmp to final
> destination.
>
> Signed-off-by: Fei Lv <feilv@asrmicro.com>

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

but I'd also like to wait for an ACK from Miklos on this feature.

As for timing, we are in the middle of the merge window for 6.11-rc1,
so we have some time before this can be considered for 6.12.
I will be on vacation for most of this development cycle, so either
Miklos will be able to queue it for 6.12 or I may be able to do
near the end of the 6.11 cycle.

Thanks,
Amir.

> ---
> V1 -> V2:
>  1. change open flags from "O_LARGEFILE | O_WRONLY" to "O_RDONLY".
>  2. change mount option to "fsync=ordered/strict/volatile".
>  3. ovl_should_sync_strict() implies ovl_should_sync().
>  4. remove redundant ovl_should_sync_strict from ovl_copy_up_meta_inode_data.
>  5. update commit log.
>  6. update documentation overlayfs.rst.
>
>  Documentation/filesystems/overlayfs.rst | 39 +++++++++++++++++++++++++
>  fs/overlayfs/copy_up.c                  | 18 ++++++++++++
>  fs/overlayfs/ovl_entry.h                | 20 +++++++++++--
>  fs/overlayfs/params.c                   | 33 ++++++++++++++++++---
>  fs/overlayfs/super.c                    |  2 +-
>  5 files changed, 105 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> index 165514401441..a783e57bdb57 100644
> --- a/Documentation/filesystems/overlayfs.rst
> +++ b/Documentation/filesystems/overlayfs.rst
> @@ -742,6 +742,45 @@ controlled by the "uuid" mount option, which supports these values:
>      mounted with "uuid=on".
>
>
> +Durability and copy up
> +----------------------
> +
> +The fsync(2) and fdatasync(2) system calls ensure that the metadata and
> +data of a file, respectively, are safely written to the backing
> +storage, which is expected to guarantee the existence of the information post
> +system crash.
> +
> +Without the fdatasync(2) call, there is no guarantee that the observed
> +data after a system crash will be either the old or the new data, but
> +in practice, the observed data after crash is often the old or new data or a
> +mix of both.
> +
> +When overlayfs file is modified for the first time, copy up will create
> +a copy of the lower file and its parent directories in the upper layer.
> +In case of a system crash, if fdatasync(2) was not called after the
> +modification, the upper file could end up with no data at all (i.e.
> +zeros), which would be an unusual outcome.  To avoid this experience,
> +overlayfs calls fsync(2) on the upper file before completing the copy up with
> +rename(2) to make the copy up "atomic".
> +
> +Depending on the backing filesystem (e.g. ubifs), fsync(2) before
> +rename(2) may not be enough to provide the "atomic" copy up behavior
> +and fsync(2) on the copied up parent directories is required as well.
> +
> +Overlayfs can be tuned to prefer performance or durability when storing
> +to the underlying upper layer.  This is controlled by the "fsync" mount
> +option, which supports these values:
> +
> +- "ordered": (default)
> +    Call fsync(2) on upper file before completion of copy up.
> +- "strict":
> +    Call fsync(2) on upper file and directories before completion of copy up.
> +- "volatile": [*]
> +    Prefer performance over durability (see `Volatile mount`_)
> +
> +[*] The mount option "volatile" is an alias to "fsync=volatile".
> +
> +
>  Volatile mount
>  --------------
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index a5ef2005a2cc..d99a18afceb8 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -243,6 +243,21 @@ static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
>         return 0;
>  }
>
> +static int ovl_copy_up_sync(struct path *path)
> +{
> +       struct file *new_file;
> +       int err;
> +
> +       new_file = ovl_path_open(path, O_RDONLY);
> +       if (IS_ERR(new_file))
> +               return PTR_ERR(new_file);
> +
> +       err = vfs_fsync(new_file, 0);
> +       fput(new_file);
> +
> +       return err;
> +}
> +
>  static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
>                             struct file *new_file, loff_t len)
>  {
> @@ -701,6 +716,9 @@ static int ovl_copy_up_metadata(struct ovl_copy_up_ctx *c, struct dentry *temp)
>                 err = ovl_set_attr(ofs, temp, &c->stat);
>         inode_unlock(temp->d_inode);
>
> +       if (!err && ovl_should_sync_strict(ofs))
> +               err = ovl_copy_up_sync(&upperpath);
> +
>         return err;
>  }
>
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index cb449ab310a7..7f6d2effd5f1 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -5,6 +5,12 @@
>   * Copyright (C) 2016 Red Hat, Inc.
>   */
>
> +enum {
> +       OVL_FSYNC_ORDERED,
> +       OVL_FSYNC_STRICT,
> +       OVL_FSYNC_VOLATILE,
> +};
> +
>  struct ovl_config {
>         char *upperdir;
>         char *workdir;
> @@ -18,7 +24,7 @@ struct ovl_config {
>         int xino;
>         bool metacopy;
>         bool userxattr;
> -       bool ovl_volatile;
> +       int fsync_mode;
>  };
>
>  struct ovl_sb {
> @@ -120,7 +126,17 @@ static inline struct ovl_fs *OVL_FS(struct super_block *sb)
>
>  static inline bool ovl_should_sync(struct ovl_fs *ofs)
>  {
> -       return !ofs->config.ovl_volatile;
> +       return ofs->config.fsync_mode != OVL_FSYNC_VOLATILE;
> +}
> +
> +static inline bool ovl_should_sync_strict(struct ovl_fs *ofs)
> +{
> +       return ofs->config.fsync_mode == OVL_FSYNC_STRICT;
> +}
> +
> +static inline bool ovl_is_volatile(struct ovl_config *config)
> +{
> +       return config->fsync_mode == OVL_FSYNC_VOLATILE;
>  }
>
>  static inline unsigned int ovl_numlower(struct ovl_entry *oe)
> diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
> index 4860fcc4611b..c4aac288b7e0 100644
> --- a/fs/overlayfs/params.c
> +++ b/fs/overlayfs/params.c
> @@ -58,6 +58,7 @@ enum ovl_opt {
>         Opt_xino,
>         Opt_metacopy,
>         Opt_verity,
> +       Opt_fsync,
>         Opt_volatile,
>  };
>
> @@ -139,6 +140,23 @@ static int ovl_verity_mode_def(void)
>         return OVL_VERITY_OFF;
>  }
>
> +static const struct constant_table ovl_parameter_fsync[] = {
> +       { "ordered",    OVL_FSYNC_ORDERED  },
> +       { "strict",     OVL_FSYNC_STRICT   },
> +       { "volatile",   OVL_FSYNC_VOLATILE },
> +       {}
> +};
> +
> +static const char *ovl_fsync_mode(struct ovl_config *config)
> +{
> +       return ovl_parameter_fsync[config->fsync_mode].name;
> +}
> +
> +static int ovl_fsync_mode_def(void)
> +{
> +       return OVL_FSYNC_ORDERED;
> +}
> +
>  const struct fs_parameter_spec ovl_parameter_spec[] = {
>         fsparam_string_empty("lowerdir",    Opt_lowerdir),
>         fsparam_string("lowerdir+",         Opt_lowerdir_add),
> @@ -154,6 +172,7 @@ const struct fs_parameter_spec ovl_parameter_spec[] = {
>         fsparam_enum("xino",                Opt_xino, ovl_parameter_xino),
>         fsparam_enum("metacopy",            Opt_metacopy, ovl_parameter_bool),
>         fsparam_enum("verity",              Opt_verity, ovl_parameter_verity),
> +       fsparam_enum("fsync",               Opt_fsync, ovl_parameter_fsync),
>         fsparam_flag("volatile",            Opt_volatile),
>         {}
>  };
> @@ -617,8 +636,11 @@ static int ovl_parse_param(struct fs_context *fc, struct fs_parameter *param)
>         case Opt_verity:
>                 config->verity_mode = result.uint_32;
>                 break;
> +       case Opt_fsync:
> +               config->fsync_mode = result.uint_32;
> +               break;
>         case Opt_volatile:
> -               config->ovl_volatile = true;
> +               config->fsync_mode = OVL_FSYNC_VOLATILE;
>                 break;
>         case Opt_userxattr:
>                 config->userxattr = true;
> @@ -802,9 +824,9 @@ int ovl_fs_params_verify(const struct ovl_fs_context *ctx,
>                 config->index = false;
>         }
>
> -       if (!config->upperdir && config->ovl_volatile) {
> +       if (!config->upperdir && ovl_is_volatile(config)) {
>                 pr_info("option \"volatile\" is meaningless in a non-upper mount, ignoring it.\n");
> -               config->ovl_volatile = false;
> +               config->fsync_mode = ovl_fsync_mode_def();
>         }
>
>         if (!config->upperdir && config->uuid == OVL_UUID_ON) {
> @@ -997,8 +1019,11 @@ int ovl_show_options(struct seq_file *m, struct dentry *dentry)
>         if (ofs->config.metacopy != ovl_metacopy_def)
>                 seq_printf(m, ",metacopy=%s",
>                            ofs->config.metacopy ? "on" : "off");
> -       if (ofs->config.ovl_volatile)
> +       if (ovl_is_volatile(&ofs->config))
>                 seq_puts(m, ",volatile");
> +       else if (ofs->config.fsync_mode != ovl_fsync_mode_def())
> +               seq_printf(m, ",fsync=%s",
> +                          ovl_fsync_mode(&ofs->config));
>         if (ofs->config.userxattr)
>                 seq_puts(m, ",userxattr");
>         if (ofs->config.verity_mode != ovl_verity_mode_def())
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 06a231970cb5..824cbcf40523 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -750,7 +750,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
>          * For volatile mount, create a incompat/volatile/dirty file to keep
>          * track of it.
>          */
> -       if (ofs->config.ovl_volatile) {
> +       if (ovl_is_volatile(&ofs->config)) {
>                 err = ovl_create_volatile_dirty(ofs);
>                 if (err < 0) {
>                         pr_err("Failed to create volatile/dirty file.\n");
>
> base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
> --
> 2.45.2
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-07-22 13:56 ` Amir Goldstein
@ 2024-08-23  9:51   ` Amir Goldstein
  2024-08-23 11:42     ` Amir Goldstein
  2024-08-26 15:59   ` Miklos Szeredi
  1 sibling, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-08-23  9:51 UTC (permalink / raw)
  To: Fei Lv; +Cc: miklos, linux-unionfs, linux-kernel, lianghuxu

On Mon, Jul 22, 2024 at 3:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
> >
> > For upper filesystem which does not enforce ordering on storing of
> > metadata changes(e.g. ubifs), when overlayfs file is modified for
> > the first time, copy up will create a copy of the lower file and
> > its parent directories in the upper layer. Permission lost of the
> > new upper parent directory was observed during power-cut stress test.
> >
> > Fix by adding new mount opion "fsync=strict", make sure data/metadata of
> > copied up directory written to disk before renaming from tmp to final
> > destination.
> >
> > Signed-off-by: Fei Lv <feilv@asrmicro.com>
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>
> but I'd also like to wait for an ACK from Miklos on this feature.
>
> As for timing, we are in the middle of the merge window for 6.11-rc1,
> so we have some time before this can be considered for 6.12.
> I will be on vacation for most of this development cycle, so either
> Miklos will be able to queue it for 6.12 or I may be able to do
> near the end of the 6.11 cycle.
>

Miklos,

Please let me know what you think of this approach to handle ubifs upper.
If you like it, I can queue this up for v6.12.

Thanks,
Amir.

>
> > ---
> > V1 -> V2:
> >  1. change open flags from "O_LARGEFILE | O_WRONLY" to "O_RDONLY".
> >  2. change mount option to "fsync=ordered/strict/volatile".
> >  3. ovl_should_sync_strict() implies ovl_should_sync().
> >  4. remove redundant ovl_should_sync_strict from ovl_copy_up_meta_inode_data.
> >  5. update commit log.
> >  6. update documentation overlayfs.rst.
> >
> >  Documentation/filesystems/overlayfs.rst | 39 +++++++++++++++++++++++++
> >  fs/overlayfs/copy_up.c                  | 18 ++++++++++++
> >  fs/overlayfs/ovl_entry.h                | 20 +++++++++++--
> >  fs/overlayfs/params.c                   | 33 ++++++++++++++++++---
> >  fs/overlayfs/super.c                    |  2 +-
> >  5 files changed, 105 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> > index 165514401441..a783e57bdb57 100644
> > --- a/Documentation/filesystems/overlayfs.rst
> > +++ b/Documentation/filesystems/overlayfs.rst
> > @@ -742,6 +742,45 @@ controlled by the "uuid" mount option, which supports these values:
> >      mounted with "uuid=on".
> >
> >
> > +Durability and copy up
> > +----------------------
> > +
> > +The fsync(2) and fdatasync(2) system calls ensure that the metadata and
> > +data of a file, respectively, are safely written to the backing
> > +storage, which is expected to guarantee the existence of the information post
> > +system crash.
> > +
> > +Without the fdatasync(2) call, there is no guarantee that the observed
> > +data after a system crash will be either the old or the new data, but
> > +in practice, the observed data after crash is often the old or new data or a
> > +mix of both.
> > +
> > +When overlayfs file is modified for the first time, copy up will create
> > +a copy of the lower file and its parent directories in the upper layer.
> > +In case of a system crash, if fdatasync(2) was not called after the
> > +modification, the upper file could end up with no data at all (i.e.
> > +zeros), which would be an unusual outcome.  To avoid this experience,
> > +overlayfs calls fsync(2) on the upper file before completing the copy up with
> > +rename(2) to make the copy up "atomic".
> > +
> > +Depending on the backing filesystem (e.g. ubifs), fsync(2) before
> > +rename(2) may not be enough to provide the "atomic" copy up behavior
> > +and fsync(2) on the copied up parent directories is required as well.
> > +
> > +Overlayfs can be tuned to prefer performance or durability when storing
> > +to the underlying upper layer.  This is controlled by the "fsync" mount
> > +option, which supports these values:
> > +
> > +- "ordered": (default)
> > +    Call fsync(2) on upper file before completion of copy up.
> > +- "strict":
> > +    Call fsync(2) on upper file and directories before completion of copy up.
> > +- "volatile": [*]
> > +    Prefer performance over durability (see `Volatile mount`_)
> > +
> > +[*] The mount option "volatile" is an alias to "fsync=volatile".
> > +
> > +
> >  Volatile mount
> >  --------------
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index a5ef2005a2cc..d99a18afceb8 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -243,6 +243,21 @@ static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
> >         return 0;
> >  }
> >
> > +static int ovl_copy_up_sync(struct path *path)
> > +{
> > +       struct file *new_file;
> > +       int err;
> > +
> > +       new_file = ovl_path_open(path, O_RDONLY);
> > +       if (IS_ERR(new_file))
> > +               return PTR_ERR(new_file);
> > +
> > +       err = vfs_fsync(new_file, 0);
> > +       fput(new_file);
> > +
> > +       return err;
> > +}
> > +
> >  static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
> >                             struct file *new_file, loff_t len)
> >  {
> > @@ -701,6 +716,9 @@ static int ovl_copy_up_metadata(struct ovl_copy_up_ctx *c, struct dentry *temp)
> >                 err = ovl_set_attr(ofs, temp, &c->stat);
> >         inode_unlock(temp->d_inode);
> >
> > +       if (!err && ovl_should_sync_strict(ofs))
> > +               err = ovl_copy_up_sync(&upperpath);
> > +
> >         return err;
> >  }
> >
> > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> > index cb449ab310a7..7f6d2effd5f1 100644
> > --- a/fs/overlayfs/ovl_entry.h
> > +++ b/fs/overlayfs/ovl_entry.h
> > @@ -5,6 +5,12 @@
> >   * Copyright (C) 2016 Red Hat, Inc.
> >   */
> >
> > +enum {
> > +       OVL_FSYNC_ORDERED,
> > +       OVL_FSYNC_STRICT,
> > +       OVL_FSYNC_VOLATILE,
> > +};
> > +
> >  struct ovl_config {
> >         char *upperdir;
> >         char *workdir;
> > @@ -18,7 +24,7 @@ struct ovl_config {
> >         int xino;
> >         bool metacopy;
> >         bool userxattr;
> > -       bool ovl_volatile;
> > +       int fsync_mode;
> >  };
> >
> >  struct ovl_sb {
> > @@ -120,7 +126,17 @@ static inline struct ovl_fs *OVL_FS(struct super_block *sb)
> >
> >  static inline bool ovl_should_sync(struct ovl_fs *ofs)
> >  {
> > -       return !ofs->config.ovl_volatile;
> > +       return ofs->config.fsync_mode != OVL_FSYNC_VOLATILE;
> > +}
> > +
> > +static inline bool ovl_should_sync_strict(struct ovl_fs *ofs)
> > +{
> > +       return ofs->config.fsync_mode == OVL_FSYNC_STRICT;
> > +}
> > +
> > +static inline bool ovl_is_volatile(struct ovl_config *config)
> > +{
> > +       return config->fsync_mode == OVL_FSYNC_VOLATILE;
> >  }
> >
> >  static inline unsigned int ovl_numlower(struct ovl_entry *oe)
> > diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
> > index 4860fcc4611b..c4aac288b7e0 100644
> > --- a/fs/overlayfs/params.c
> > +++ b/fs/overlayfs/params.c
> > @@ -58,6 +58,7 @@ enum ovl_opt {
> >         Opt_xino,
> >         Opt_metacopy,
> >         Opt_verity,
> > +       Opt_fsync,
> >         Opt_volatile,
> >  };
> >
> > @@ -139,6 +140,23 @@ static int ovl_verity_mode_def(void)
> >         return OVL_VERITY_OFF;
> >  }
> >
> > +static const struct constant_table ovl_parameter_fsync[] = {
> > +       { "ordered",    OVL_FSYNC_ORDERED  },
> > +       { "strict",     OVL_FSYNC_STRICT   },
> > +       { "volatile",   OVL_FSYNC_VOLATILE },
> > +       {}
> > +};
> > +
> > +static const char *ovl_fsync_mode(struct ovl_config *config)
> > +{
> > +       return ovl_parameter_fsync[config->fsync_mode].name;
> > +}
> > +
> > +static int ovl_fsync_mode_def(void)
> > +{
> > +       return OVL_FSYNC_ORDERED;
> > +}
> > +
> >  const struct fs_parameter_spec ovl_parameter_spec[] = {
> >         fsparam_string_empty("lowerdir",    Opt_lowerdir),
> >         fsparam_string("lowerdir+",         Opt_lowerdir_add),
> > @@ -154,6 +172,7 @@ const struct fs_parameter_spec ovl_parameter_spec[] = {
> >         fsparam_enum("xino",                Opt_xino, ovl_parameter_xino),
> >         fsparam_enum("metacopy",            Opt_metacopy, ovl_parameter_bool),
> >         fsparam_enum("verity",              Opt_verity, ovl_parameter_verity),
> > +       fsparam_enum("fsync",               Opt_fsync, ovl_parameter_fsync),
> >         fsparam_flag("volatile",            Opt_volatile),
> >         {}
> >  };
> > @@ -617,8 +636,11 @@ static int ovl_parse_param(struct fs_context *fc, struct fs_parameter *param)
> >         case Opt_verity:
> >                 config->verity_mode = result.uint_32;
> >                 break;
> > +       case Opt_fsync:
> > +               config->fsync_mode = result.uint_32;
> > +               break;
> >         case Opt_volatile:
> > -               config->ovl_volatile = true;
> > +               config->fsync_mode = OVL_FSYNC_VOLATILE;
> >                 break;
> >         case Opt_userxattr:
> >                 config->userxattr = true;
> > @@ -802,9 +824,9 @@ int ovl_fs_params_verify(const struct ovl_fs_context *ctx,
> >                 config->index = false;
> >         }
> >
> > -       if (!config->upperdir && config->ovl_volatile) {
> > +       if (!config->upperdir && ovl_is_volatile(config)) {
> >                 pr_info("option \"volatile\" is meaningless in a non-upper mount, ignoring it.\n");
> > -               config->ovl_volatile = false;
> > +               config->fsync_mode = ovl_fsync_mode_def();
> >         }
> >
> >         if (!config->upperdir && config->uuid == OVL_UUID_ON) {
> > @@ -997,8 +1019,11 @@ int ovl_show_options(struct seq_file *m, struct dentry *dentry)
> >         if (ofs->config.metacopy != ovl_metacopy_def)
> >                 seq_printf(m, ",metacopy=%s",
> >                            ofs->config.metacopy ? "on" : "off");
> > -       if (ofs->config.ovl_volatile)
> > +       if (ovl_is_volatile(&ofs->config))
> >                 seq_puts(m, ",volatile");
> > +       else if (ofs->config.fsync_mode != ovl_fsync_mode_def())
> > +               seq_printf(m, ",fsync=%s",
> > +                          ovl_fsync_mode(&ofs->config));
> >         if (ofs->config.userxattr)
> >                 seq_puts(m, ",userxattr");
> >         if (ofs->config.verity_mode != ovl_verity_mode_def())
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 06a231970cb5..824cbcf40523 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -750,7 +750,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
> >          * For volatile mount, create a incompat/volatile/dirty file to keep
> >          * track of it.
> >          */
> > -       if (ofs->config.ovl_volatile) {
> > +       if (ovl_is_volatile(&ofs->config)) {
> >                 err = ovl_create_volatile_dirty(ofs);
> >                 if (err < 0) {
> >                         pr_err("Failed to create volatile/dirty file.\n");
> >
> > base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
> > --
> > 2.45.2
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-23  9:51   ` Amir Goldstein
@ 2024-08-23 11:42     ` Amir Goldstein
  2024-08-26  6:56       ` 答复: " Lv Fei(吕飞)
  0 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-08-23 11:42 UTC (permalink / raw)
  To: Fei Lv; +Cc: miklos, linux-unionfs, linux-kernel, lianghuxu

On Fri, Aug 23, 2024 at 11:51 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 3:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
> > >
> > > For upper filesystem which does not enforce ordering on storing of
> > > metadata changes(e.g. ubifs), when overlayfs file is modified for
> > > the first time, copy up will create a copy of the lower file and
> > > its parent directories in the upper layer. Permission lost of the
> > > new upper parent directory was observed during power-cut stress test.
> > >
> > > Fix by adding new mount opion "fsync=strict", make sure data/metadata of
> > > copied up directory written to disk before renaming from tmp to final
> > > destination.
> > >
> > > Signed-off-by: Fei Lv <feilv@asrmicro.com>
> >
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> >
> > but I'd also like to wait for an ACK from Miklos on this feature.
> >
> > As for timing, we are in the middle of the merge window for 6.11-rc1,
> > so we have some time before this can be considered for 6.12.
> > I will be on vacation for most of this development cycle, so either
> > Miklos will be able to queue it for 6.12 or I may be able to do
> > near the end of the 6.11 cycle.
> >
>
> Miklos,
>
> Please let me know what you think of this approach to handle ubifs upper.
> If you like it, I can queue this up for v6.12.
>
> Thanks,
> Amir.
>
> >
> > > ---
> > > V1 -> V2:
> > >  1. change open flags from "O_LARGEFILE | O_WRONLY" to "O_RDONLY".
> > >  2. change mount option to "fsync=ordered/strict/volatile".
> > >  3. ovl_should_sync_strict() implies ovl_should_sync().
> > >  4. remove redundant ovl_should_sync_strict from ovl_copy_up_meta_inode_data.
> > >  5. update commit log.
> > >  6. update documentation overlayfs.rst.
> > >

Hi Fei,

I started to test this patch and it occured to me that we have no test
coverage for
the "volatile" feature.

Filesystem durability tests are not easy to write and I know that you
tested your
own use case, so I will not ask you to write a regression test as a
condition for merge,
but if you are willing to help, it would be very nice to add this test coverage.

There is already one overlayfs test in fstests (overlay/078) which
tests behavior
of overlayfs copy up during power cut (a.k.a shutdown).

One thing that I do request is that you confirm that you tested that the legacy
"volatile" mount option still works as before.
I saw that you took care of preserving the legacy mount option in display,
which is good practice.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* 答复: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-23 11:42     ` Amir Goldstein
@ 2024-08-26  6:56       ` Lv Fei(吕飞)
  2024-08-26 13:03         ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Lv Fei(吕飞) @ 2024-08-26  6:56 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: miklos@szeredi.hu, linux-unionfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Xu Lianghu(徐良虎)



> 发件人: Amir Goldstein [mailto:amir73il@gmail.com] 
> 发送时间: 2024年8月23日 19:43
> 收件人: Lv Fei(吕飞) <feilv@asrmicro.com>
> 抄送: miklos@szeredi.hu; linux-unionfs@vger.kernel.org; linux-kernel@vger.kernel.org; Xu Lianghu(徐良虎) <lianghuxu@asrmicro.> com>
> 主题: Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
> 
> On Fri, Aug 23, 2024 at 11:51 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Jul 22, 2024 at 3:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
> > > >
> > > > For upper filesystem which does not enforce ordering on storing of 
> > > > metadata changes(e.g. ubifs), when overlayfs file is modified for 
> > > > the first time, copy up will create a copy of the lower file and 
> > > > its parent directories in the upper layer. Permission lost of the 
> > > > new upper parent directory was observed during power-cut stress test.
> > > >
> > > > Fix by adding new mount opion "fsync=strict", make sure 

There is a typo here, "opion" should be "option", please help correct before merge.

> > > > data/metadata of copied up directory written to disk before 
> > > > renaming from tmp to final destination.
> > > >
> > > > Signed-off-by: Fei Lv <feilv@asrmicro.com>
> > >
> > > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > but I'd also like to wait for an ACK from Miklos on this feature.
> > >
> > > As for timing, we are in the middle of the merge window for 
> > > 6.11-rc1, so we have some time before this can be considered for 6.12.
> > > I will be on vacation for most of this development cycle, so either 
> > > Miklos will be able to queue it for 6.12 or I may be able to do near 
> > > the end of the 6.11 cycle.
> > >
> >
> > Miklos,
> >
> > Please let me know what you think of this approach to handle ubifs upper.
> > If you like it, I can queue this up for v6.12.
> >
> > Thanks,
> > Amir.
> >
> > >
> > > > ---
> > > > V1 -> V2:
> > > >  1. change open flags from "O_LARGEFILE | O_WRONLY" to "O_RDONLY".
> > > >  2. change mount option to "fsync=ordered/strict/volatile".
> > > >  3. ovl_should_sync_strict() implies ovl_should_sync().
> > > >  4. remove redundant ovl_should_sync_strict from ovl_copy_up_meta_inode_data.
> > > >  5. update commit log.
> > > >  6. update documentation overlayfs.rst.
> > > >
> 
> Hi Fei,
> 
> I started to test this patch and it occured to me that we have no test coverage for the "volatile" feature.
> 
> Filesystem durability tests are not easy to write and I know that you tested your own use case, so I will not ask you to write a regression test as a condition for merge, but if you are willing to help, it would be very nice to add this test coverage.

OK, I can have a try, need some time to study test suite. This is a new thing for me.

> 
> There is already one overlayfs test in fstests (overlay/078) which tests behavior of overlayfs copy up during power cut (a.k.a shutdown).

Do you mean overlay/078 in kernel/git/brauner/xfstests-dev.git ?

> 
> One thing that I do request is that you confirm that you tested that the legacy "volatile" mount option still works as before.

Yes, I tested basic function of "volatile" mount option with this patch.

> I saw that you took care of preserving the legacy mount option in display, which is good practice.
> 
> Thanks,
> Amir.

Thanks,
Fei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-26  6:56       ` 答复: " Lv Fei(吕飞)
@ 2024-08-26 13:03         ` Amir Goldstein
  0 siblings, 0 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-08-26 13:03 UTC (permalink / raw)
  To: Lv Fei(吕飞)
  Cc: miklos@szeredi.hu, linux-unionfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Xu Lianghu(徐良虎)

On Mon, Aug 26, 2024 at 8:56 AM Lv Fei(吕飞) <feilv@asrmicro.com> wrote:
>
>
>
> > 发件人: Amir Goldstein [mailto:amir73il@gmail.com]
> > 发送时间: 2024年8月23日 19:43
> > 收件人: Lv Fei(吕飞) <feilv@asrmicro.com>
> > 抄送: miklos@szeredi.hu; linux-unionfs@vger.kernel.org; linux-kernel@vger.kernel.org; Xu Lianghu(徐良虎) <lianghuxu@asrmicro.> com>
> > 主题: Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
> >
> > On Fri, Aug 23, 2024 at 11:51 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Mon, Jul 22, 2024 at 3:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
> > > > >
> > > > > For upper filesystem which does not enforce ordering on storing of
> > > > > metadata changes(e.g. ubifs), when overlayfs file is modified for
> > > > > the first time, copy up will create a copy of the lower file and
> > > > > its parent directories in the upper layer. Permission lost of the
> > > > > new upper parent directory was observed during power-cut stress test.
> > > > >
> > > > > Fix by adding new mount opion "fsync=strict", make sure
>
> There is a typo here, "opion" should be "option", please help correct before merge.
>

No problem, but I am still waiting for Miklos to comment on this option.

> > > > > data/metadata of copied up directory written to disk before
> > > > > renaming from tmp to final destination.
> > > > >
> > > > > Signed-off-by: Fei Lv <feilv@asrmicro.com>
> > > >
> > > > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > > >
> > > > but I'd also like to wait for an ACK from Miklos on this feature.
> > > >
> > > > As for timing, we are in the middle of the merge window for
> > > > 6.11-rc1, so we have some time before this can be considered for 6.12.
> > > > I will be on vacation for most of this development cycle, so either
> > > > Miklos will be able to queue it for 6.12 or I may be able to do near
> > > > the end of the 6.11 cycle.
> > > >
> > >
> > > Miklos,
> > >
> > > Please let me know what you think of this approach to handle ubifs upper.
> > > If you like it, I can queue this up for v6.12.
> > >
> > > Thanks,
> > > Amir.
> > >
> > > >
> > > > > ---
> > > > > V1 -> V2:
> > > > >  1. change open flags from "O_LARGEFILE | O_WRONLY" to "O_RDONLY".
> > > > >  2. change mount option to "fsync=ordered/strict/volatile".
> > > > >  3. ovl_should_sync_strict() implies ovl_should_sync().
> > > > >  4. remove redundant ovl_should_sync_strict from ovl_copy_up_meta_inode_data.
> > > > >  5. update commit log.
> > > > >  6. update documentation overlayfs.rst.
> > > > >
> >
> > Hi Fei,
> >
> > I started to test this patch and it occured to me that we have no test coverage for the "volatile" feature.
> >
> > Filesystem durability tests are not easy to write and I know that you tested your own use case, so I will not ask you to write a regression test as a condition for merge, but if you are willing to help, it would be very nice to add this test coverage.
>
> OK, I can have a try, need some time to study test suite. This is a new thing for me.
>

Whenever you can.

> >
> > There is already one overlayfs test in fstests (overlay/078) which tests behavior of overlayfs copy up during power cut (a.k.a shutdown).
>
> Do you mean overlay/078 in kernel/git/brauner/xfstests-dev.git ?
>

Yes overlay/078, but upstream repo is
git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git

> >
> > One thing that I do request is that you confirm that you tested that the legacy "volatile" mount option still works as before.
>
> Yes, I tested basic function of "volatile" mount option with this patch.
>

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-07-22 13:56 ` Amir Goldstein
  2024-08-23  9:51   ` Amir Goldstein
@ 2024-08-26 15:59   ` Miklos Szeredi
  2024-08-27  8:46     ` Amir Goldstein
  1 sibling, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2024-08-26 15:59 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Fei Lv, linux-unionfs, linux-kernel, lianghuxu

On Mon, 22 Jul 2024 at 15:56, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
> >
> > For upper filesystem which does not enforce ordering on storing of
> > metadata changes(e.g. ubifs), when overlayfs file is modified for
> > the first time, copy up will create a copy of the lower file and
> > its parent directories in the upper layer. Permission lost of the
> > new upper parent directory was observed during power-cut stress test.
> >
> > Fix by adding new mount opion "fsync=strict", make sure data/metadata of
> > copied up directory written to disk before renaming from tmp to final
> > destination.
> >
> > Signed-off-by: Fei Lv <feilv@asrmicro.com>
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>
> but I'd also like to wait for an ACK from Miklos on this feature.

I'm okay with this.  I'm a little confused about sync=strict mode,
since most copy ups will have vfs_fsync() called twice.  Is this what
we want, or could this be consolidated into a single fsync?

Also is it worth optimizing away the fsync on the directory in cases
the filesystem is well behaved?  Maybe we should just move the
vfs_fsync() call into ovl_copy_up_metadata() and omit the complexity
related to the additional mount option?

To me it feels that it shouldn't matter in terms of performance, but
if reports of performance regressions come in, we can still make this
optional.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-26 15:59   ` Miklos Szeredi
@ 2024-08-27  8:46     ` Amir Goldstein
  2024-08-29 10:29       ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-08-27  8:46 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Fei Lv, linux-unionfs, linux-kernel, lianghuxu

On Mon, Aug 26, 2024 at 5:59 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 22 Jul 2024 at 15:56, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
> > >
> > > For upper filesystem which does not enforce ordering on storing of
> > > metadata changes(e.g. ubifs), when overlayfs file is modified for
> > > the first time, copy up will create a copy of the lower file and
> > > its parent directories in the upper layer. Permission lost of the
> > > new upper parent directory was observed during power-cut stress test.
> > >
> > > Fix by adding new mount opion "fsync=strict", make sure data/metadata of
> > > copied up directory written to disk before renaming from tmp to final
> > > destination.
> > >
> > > Signed-off-by: Fei Lv <feilv@asrmicro.com>
> >
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> >
> > but I'd also like to wait for an ACK from Miklos on this feature.
>
> I'm okay with this.  I'm a little confused about sync=strict mode,
> since most copy ups will have vfs_fsync() called twice.  Is this what
> we want, or could this be consolidated into a single fsync?
>

Maybe it could, but remember that ubifs strict mode is the odd case
if we have an extra fsync for the odd case, I think code simplicity is
a more important factor.

> Also is it worth optimizing away the fsync on the directory in cases
> the filesystem is well behaved?  Maybe we should just move the
> vfs_fsync() call into ovl_copy_up_metadata() and omit the complexity
> related to the additional mount option?
>

Hmm. Maybe you are confused by the commit message that only mentions
fsync of the parent directory (same as the reported reproducer), but
the strict mode fsync also affects metacopy, not only parent dir copy up.

> To me it feels that it shouldn't matter in terms of performance, but
> if reports of performance regressions come in, we can still make this
> optional.
>

I think that the case of chown -R with metacopy is going to be terribly crippled
if every metacopy gets and fsync.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-27  8:46     ` Amir Goldstein
@ 2024-08-29 10:29       ` Amir Goldstein
  2024-08-29 12:51         ` Miklos Szeredi
  0 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-08-29 10:29 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Fei Lv, linux-unionfs, linux-kernel, lianghuxu

On Tue, Aug 27, 2024 at 10:46 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Aug 26, 2024 at 5:59 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Mon, 22 Jul 2024 at 15:56, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Mon, Jul 22, 2024 at 1:14 PM Fei Lv <feilv@asrmicro.com> wrote:
> > > >
> > > > For upper filesystem which does not enforce ordering on storing of
> > > > metadata changes(e.g. ubifs), when overlayfs file is modified for
> > > > the first time, copy up will create a copy of the lower file and
> > > > its parent directories in the upper layer. Permission lost of the
> > > > new upper parent directory was observed during power-cut stress test.
> > > >
> > > > Fix by adding new mount opion "fsync=strict", make sure data/metadata of
> > > > copied up directory written to disk before renaming from tmp to final
> > > > destination.
> > > >
> > > > Signed-off-by: Fei Lv <feilv@asrmicro.com>
> > >
> > > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > but I'd also like to wait for an ACK from Miklos on this feature.
> >
> > I'm okay with this.  I'm a little confused about sync=strict mode,
> > since most copy ups will have vfs_fsync() called twice.  Is this what
> > we want, or could this be consolidated into a single fsync?
> >
>
> Maybe it could, but remember that ubifs strict mode is the odd case
> if we have an extra fsync for the odd case, I think code simplicity is
> a more important factor.
>
> > Also is it worth optimizing away the fsync on the directory in cases
> > the filesystem is well behaved?  Maybe we should just move the
> > vfs_fsync() call into ovl_copy_up_metadata() and omit the complexity
> > related to the additional mount option?
> >

Maybe, but note that in ovl_copy_up_meta_inode_data(),
copy up of data still requires fsync and there is no call to
ovl_copy_up_metadata() in that code path, so trying to optimize
double fsync in all the code paths in going to be a PITA IMA
and not worth the trouble.

>
> Hmm. Maybe you are confused by the commit message that only mentions
> fsync of the parent directory (same as the reported reproducer), but
> the strict mode fsync also affects metacopy, not only parent dir copy up.
>
> > To me it feels that it shouldn't matter in terms of performance, but
> > if reports of performance regressions come in, we can still make this
> > optional.
> >
>
> I think that the case of chown -R with metacopy is going to be terribly crippled
> if every metacopy gets and fsync.
>

But maybe we can ignore crash safety of metacopy on ubifs, because
1. the ubifs users may not be using this feature
2. ubifs may be nice and takes care of ordering O_TMPFILE
    metadata updates before exposing the link

Then we can do the following:
IF (metacopy_enabled)
    fsync only in ovl_copy_up_file()
ELSE
    fsync only in ovl_copy_up_metadata()

Let me know what you think.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-29 10:29       ` Amir Goldstein
@ 2024-08-29 12:51         ` Miklos Szeredi
  2024-08-29 16:23           ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2024-08-29 12:51 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Fei Lv, linux-unionfs, linux-kernel, lianghuxu

On Thu, 29 Aug 2024 at 12:29, Amir Goldstein <amir73il@gmail.com> wrote:

> But maybe we can ignore crash safety of metacopy on ubifs, because
> 1. the ubifs users may not be using this feature
> 2. ubifs may be nice and takes care of ordering O_TMPFILE
>     metadata updates before exposing the link
>
> Then we can do the following:
> IF (metacopy_enabled)
>     fsync only in ovl_copy_up_file()
> ELSE
>     fsync only in ovl_copy_up_metadata()
>
> Let me know what you think.

Sounds like a good compromise.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-29 12:51         ` Miklos Szeredi
@ 2024-08-29 16:23           ` Amir Goldstein
  2024-08-30  9:26             ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-08-29 16:23 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Fei Lv, linux-unionfs, linux-kernel, lianghuxu

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On Thu, Aug 29, 2024 at 2:51 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 29 Aug 2024 at 12:29, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > But maybe we can ignore crash safety of metacopy on ubifs, because
> > 1. the ubifs users may not be using this feature
> > 2. ubifs may be nice and takes care of ordering O_TMPFILE
> >     metadata updates before exposing the link
> >
> > Then we can do the following:
> > IF (metacopy_enabled)
> >     fsync only in ovl_copy_up_file()
> > ELSE
> >     fsync only in ovl_copy_up_metadata()
> >
> > Let me know what you think.
>
> Sounds like a good compromise.
>

Fei,

Could you please test the attached patch and confirm that your
use case does not depend on metacopy enabled?

In any case, I am holding on to your patch in case someone reports
a performance regression with this unconditional fsync approach.

Thanks,
Amir.

[-- Attachment #2: 0001-ovl-fsync-after-metadata-copy-up.patch --]
[-- Type: text/x-patch, Size: 4389 bytes --]

From 6ddc5bbdd2010d90d7eca0abb340eaeeafafc38a Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@gmail.com>
Date: Thu, 29 Aug 2024 17:51:08 +0200
Subject: [PATCH] ovl: fsync after metadata copy-up

For upper filesystems which do not use strict ordering of persisting
metadata changes (e.g. ubifs), when overlayfs file is modified for
the first time, copy up will create a copy of the lower file and
its parent directories in the upper layer. Permission lost of the
new upper parent directory was observed during power-cut stress test.

Fix by moving the fsync call after metadata copy to make sure that the
metadata copied up directory and files persists to disk before renaming
from tmp to final destination.

With metacopy enabled, this change will hurt performance of workloads
such as chown -R, so we keep the legacy behavior of fsync on data copyup
and never fsync on copyup of metadata of files and directories.

Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxj-pOvmw1-uXR3qVdqtLjSkwcR9nVKcNU_vC10Zyf2miQ@mail.gmail.com/
Reported-by: Fei Lv <feilv@asrmicro.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/copy_up.c | 43 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index a5ef2005a2cc..e9348ac1548a 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -243,8 +243,24 @@ static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
 	return 0;
 }
 
+static int ovl_sync_file(struct path *path)
+{
+	struct file *new_file;
+	int err;
+
+	new_file = ovl_path_open(path, O_RDONLY);
+	if (IS_ERR(new_file))
+		return PTR_ERR(new_file);
+
+	err = vfs_fsync(new_file, 0);
+	fput(new_file);
+
+	return err;
+}
+
 static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
-			    struct file *new_file, loff_t len)
+			    struct file *new_file, loff_t len,
+			    bool datasync)
 {
 	struct path datapath;
 	struct file *old_file;
@@ -342,7 +358,8 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
 
 		len -= bytes;
 	}
-	if (!error && ovl_should_sync(ofs))
+	/* call fsync once, either now or later along with metadata */
+	if (!error && ovl_should_sync(ofs) && datasync)
 		error = vfs_fsync(new_file, 0);
 out_fput:
 	fput(old_file);
@@ -574,6 +591,7 @@ struct ovl_copy_up_ctx {
 	bool indexed;
 	bool metacopy;
 	bool metacopy_digest;
+	bool metadata_fsync;
 };
 
 static int ovl_link_up(struct ovl_copy_up_ctx *c)
@@ -634,7 +652,8 @@ static int ovl_copy_up_data(struct ovl_copy_up_ctx *c, const struct path *temp)
 	if (IS_ERR(new_file))
 		return PTR_ERR(new_file);
 
-	err = ovl_copy_up_file(ofs, c->dentry, new_file, c->stat.size);
+	err = ovl_copy_up_file(ofs, c->dentry, new_file, c->stat.size,
+			       !c->metadata_fsync);
 	fput(new_file);
 
 	return err;
@@ -701,6 +720,10 @@ static int ovl_copy_up_metadata(struct ovl_copy_up_ctx *c, struct dentry *temp)
 		err = ovl_set_attr(ofs, temp, &c->stat);
 	inode_unlock(temp->d_inode);
 
+	/* fsync metadata before move into upper dir */
+	if (!err && ovl_should_sync(ofs) && c->metadata_fsync)
+		err = ovl_sync_file(&upperpath);
+
 	return err;
 }
 
@@ -860,7 +883,8 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
 
 	temp = tmpfile->f_path.dentry;
 	if (!c->metacopy && c->stat.size) {
-		err = ovl_copy_up_file(ofs, c->dentry, tmpfile, c->stat.size);
+		err = ovl_copy_up_file(ofs, c->dentry, tmpfile, c->stat.size,
+				       !c->metadata_fsync);
 		if (err)
 			goto out_fput;
 	}
@@ -1135,6 +1159,17 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
 	    !kgid_has_mapping(current_user_ns(), ctx.stat.gid))
 		return -EOVERFLOW;
 
+	/*
+	 * With metacopy disabled, we fsync after final metadata copyup, for
+	 * both regular and non-regular files to get atomic copyup semantics
+	 * on filesystems that do not use strict metadata ordering (e.g. ubifs).
+	 *
+	 * With metacopy enabled we want to avoid fsync on all meta copyup
+	 * that will hurt performance of workloads such as chown -R, so we
+	 * only fsync on data copyup and never fsync on copyup of non-regular
+	 * files and directories.
+	 */
+	ctx.metadata_fsync = !OVL_FS(dentry->d_sb)->config.metacopy;
 	ctx.metacopy = ovl_need_meta_copy_up(dentry, ctx.stat.mode, flags);
 
 	if (parent) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-29 16:23           ` Amir Goldstein
@ 2024-08-30  9:26             ` Amir Goldstein
  2024-08-30 11:52               ` 答复: " Lv Fei(吕飞)
  0 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2024-08-30  9:26 UTC (permalink / raw)
  To: Miklos Szeredi, Fei Lv; +Cc: linux-unionfs, linux-kernel, lianghuxu

On Thu, Aug 29, 2024 at 6:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Aug 29, 2024 at 2:51 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, 29 Aug 2024 at 12:29, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > But maybe we can ignore crash safety of metacopy on ubifs, because
> > > 1. the ubifs users may not be using this feature
> > > 2. ubifs may be nice and takes care of ordering O_TMPFILE
> > >     metadata updates before exposing the link
> > >
> > > Then we can do the following:
> > > IF (metacopy_enabled)
> > >     fsync only in ovl_copy_up_file()
> > > ELSE
> > >     fsync only in ovl_copy_up_metadata()
> > >
> > > Let me know what you think.
> >
> > Sounds like a good compromise.
> >
>
> Fei,
>
> Could you please test the attached patch and confirm that your
> use case does not depend on metacopy enabled?
>
> In any case, I am holding on to your patch in case someone reports
> a performance regression with this unconditional fsync approach.
>

Well, it's a good thing that I took Miklois' advice to make the fsync
option implicit, because the original patch had 2 bugs detected by fstest:
1. missing O_LARGEFILE
2. trying to fsync special files

Please see uptodate patch at:
https://github.com/amir73il/linux/commits/ovl-fsync/

If there are no complaints, I will queue this up for v6.12.
Fei, please provide your Tested-by.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* 答复: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-30  9:26             ` Amir Goldstein
@ 2024-08-30 11:52               ` Lv Fei(吕飞)
  2024-08-30 12:23                 ` Amir Goldstein
  0 siblings, 1 reply; 14+ messages in thread
From: Lv Fei(吕飞) @ 2024-08-30 11:52 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: linux-unionfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Xu Lianghu(徐良虎)

> On Thu, Aug 29, 2024 at 6:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Aug 29, 2024 at 2:51 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Thu, 29 Aug 2024 at 12:29, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > But maybe we can ignore crash safety of metacopy on ubifs, because 
> > > > 1. the ubifs users may not be using this feature 2. ubifs may be 
> > > > nice and takes care of ordering O_TMPFILE
> > > >     metadata updates before exposing the link
> > > >
> > > > Then we can do the following:
> > > > IF (metacopy_enabled)
> > > >     fsync only in ovl_copy_up_file() ELSE
> > > >     fsync only in ovl_copy_up_metadata()
> > > >
> > > > Let me know what you think.
> > >
> > > Sounds like a good compromise.
> > >
> >
> > Fei,
> >
> > Could you please test the attached patch and confirm that your use 
> > case does not depend on metacopy enabled?
> >
> > In any case, I am holding on to your patch in case someone reports a 
> > performance regression with this unconditional fsync approach.
> >
> 
> Well, it's a good thing that I took Miklois' advice to make the fsync option implicit, because > the original patch had 2 bugs detected by fstest:
> 1. missing O_LARGEFILE
> 2. trying to fsync special files
> 
> Please see uptodate patch at:
> https://github.com/amir73il/linux/commits/ovl-fsync/
> 
> If there are no complaints, I will queue this up for v6.12.
> Fei, please provide your Tested-by.

We do not enable metacopy.
Tested this patch and it also solved our issue.

Thanks,
Fei


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict"
  2024-08-30 11:52               ` 答复: " Lv Fei(吕飞)
@ 2024-08-30 12:23                 ` Amir Goldstein
  0 siblings, 0 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-08-30 12:23 UTC (permalink / raw)
  To: Lv Fei(吕飞)
  Cc: Miklos Szeredi, linux-unionfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Xu Lianghu(徐良虎)

On Fri, Aug 30, 2024 at 1:52 PM Lv Fei(吕飞) <feilv@asrmicro.com> wrote:
>
> > On Thu, Aug 29, 2024 at 6:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Thu, Aug 29, 2024 at 2:51 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > >
> > > > On Thu, 29 Aug 2024 at 12:29, Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > > But maybe we can ignore crash safety of metacopy on ubifs, because
> > > > > 1. the ubifs users may not be using this feature 2. ubifs may be
> > > > > nice and takes care of ordering O_TMPFILE
> > > > >     metadata updates before exposing the link
> > > > >
> > > > > Then we can do the following:
> > > > > IF (metacopy_enabled)
> > > > >     fsync only in ovl_copy_up_file() ELSE
> > > > >     fsync only in ovl_copy_up_metadata()
> > > > >
> > > > > Let me know what you think.
> > > >
> > > > Sounds like a good compromise.
> > > >
> > >
> > > Fei,
> > >
> > > Could you please test the attached patch and confirm that your use
> > > case does not depend on metacopy enabled?
> > >
> > > In any case, I am holding on to your patch in case someone reports a
> > > performance regression with this unconditional fsync approach.
> > >
> >
> > Well, it's a good thing that I took Miklois' advice to make the fsync option implicit, because > the original patch had 2 bugs detected by fstest:
> > 1. missing O_LARGEFILE
> > 2. trying to fsync special files
> >
> > Please see uptodate patch at:
> > https://github.com/amir73il/linux/commits/ovl-fsync/
> >
> > If there are no complaints, I will queue this up for v6.12.
> > Fei, please provide your Tested-by.
>
> We do not enable metacopy.
> Tested this patch and it also solved our issue.

Hi Fei,

Thanks for approving.
I added Reported-and-tested-by and pushed to overlayfs-next.

Now we just need to hope that users won't come shouting about
performance regressions.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-08-30 12:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 10:14 [PATCH V2] ovl: fsync after metadata copy-up via mount option "fsync=strict" Fei Lv
2024-07-22 13:56 ` Amir Goldstein
2024-08-23  9:51   ` Amir Goldstein
2024-08-23 11:42     ` Amir Goldstein
2024-08-26  6:56       ` 答复: " Lv Fei(吕飞)
2024-08-26 13:03         ` Amir Goldstein
2024-08-26 15:59   ` Miklos Szeredi
2024-08-27  8:46     ` Amir Goldstein
2024-08-29 10:29       ` Amir Goldstein
2024-08-29 12:51         ` Miklos Szeredi
2024-08-29 16:23           ` Amir Goldstein
2024-08-30  9:26             ` Amir Goldstein
2024-08-30 11:52               ` 答复: " Lv Fei(吕飞)
2024-08-30 12:23                 ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox