* [RFC 1/6] fs: add frozen sb state helpers
2025-03-26 11:22 [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
@ 2025-03-26 11:22 ` Luis Chamberlain
2025-03-26 11:22 ` [RFC 2/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl() Luis Chamberlain
` (5 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: Luis Chamberlain @ 2025-03-26 11:22 UTC (permalink / raw)
To: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, Luis Chamberlain
Provide helpers so that we can check a superblock frozen state.
This will make subsequent changes easier to read. This makes
no functional changes.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
fs/ext4/ext4_jbd2.c | 2 +-
fs/gfs2/sys.c | 2 +-
fs/quota/quota.c | 3 ++-
fs/super.c | 8 ++++----
fs/xfs/xfs_trans.c | 3 +--
include/linux/fs.h | 22 ++++++++++++++++++++++
6 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 135e278c832e..5f5c2121d2ad 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -75,7 +75,7 @@ static int ext4_journal_check_start(struct super_block *sb)
if (WARN_ON_ONCE(sb_rdonly(sb)))
return -EROFS;
- WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
+ WARN_ON(sb_is_frozen(sb));
journal = EXT4_SB(sb)->s_journal;
/*
* Special case here: if the journal has aborted behind our
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index ecc699f8d9fc..08ec5904a208 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -156,7 +156,7 @@ static ssize_t uuid_show(struct gfs2_sbd *sdp, char *buf)
static ssize_t freeze_show(struct gfs2_sbd *sdp, char *buf)
{
struct super_block *sb = sdp->sd_vfs;
- int frozen = (sb->s_writers.frozen == SB_UNFROZEN) ? 0 : 1;
+ int frozen = sb_is_unfrozen(sb) ? 0 : 1;
return snprintf(buf, PAGE_SIZE, "%d\n", frozen);
}
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 7c2b75a44485..9b4e0a80f386 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -890,11 +890,12 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
sb = user_get_super(dev, excl);
if (!sb)
return ERR_PTR(-ENODEV);
- if (thawed && sb->s_writers.frozen != SB_UNFROZEN) {
+ if (thawed && !sb_is_unfrozen(sb)) {
if (excl)
up_write(&sb->s_umount);
else
up_read(&sb->s_umount);
+
/* Wait for sb to unfreeze */
sb_start_write(sb);
sb_end_write(sb);
diff --git a/fs/super.c b/fs/super.c
index 97a17f9d9023..117bd1bfe09f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1029,7 +1029,7 @@ int reconfigure_super(struct fs_context *fc)
if (fc->sb_flags_mask & ~MS_RMT_MASK)
return -EINVAL;
- if (sb->s_writers.frozen != SB_UNFROZEN)
+ if (!(sb_is_unfrozen(sb)))
return -EBUSY;
retval = security_sb_remount(sb, fc->security);
@@ -1053,7 +1053,7 @@ int reconfigure_super(struct fs_context *fc)
__super_lock_excl(sb);
if (!sb->s_root)
return 0;
- if (sb->s_writers.frozen != SB_UNFROZEN)
+ if (!sb_is_unfrozen(sb))
return -EBUSY;
remount_ro = !sb_rdonly(sb);
}
@@ -2009,7 +2009,7 @@ int freeze_super(struct super_block *sb, enum freeze_holder who)
atomic_inc(&sb->s_active);
retry:
- if (sb->s_writers.frozen == SB_FREEZE_COMPLETE) {
+ if (sb_is_frozen(sb)) {
if (may_freeze(sb, who))
ret = !!WARN_ON_ONCE(freeze_inc(sb, who) == 1);
else
@@ -2019,7 +2019,7 @@ int freeze_super(struct super_block *sb, enum freeze_holder who)
return ret;
}
- if (sb->s_writers.frozen != SB_UNFROZEN) {
+ if (sb_is_unfrozen(sb)) {
ret = wait_for_partially_frozen(sb);
if (ret) {
deactivate_locked_super(sb);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index c6657072361a..3a5088865064 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -269,8 +269,7 @@ xfs_trans_alloc(
* Zero-reservation ("empty") transactions can't modify anything, so
* they're allowed to run while we're frozen.
*/
- WARN_ON(resp->tr_logres > 0 &&
- mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
+ WARN_ON(resp->tr_logres > 0 && sb_is_frozen(mp->m_super));
ASSERT(!(flags & XFS_TRANS_RES_FDBLKS) ||
xfs_has_lazysbcount(mp));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 016b0fe1536e..1d9a9c557e1a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1971,6 +1971,28 @@ static inline bool sb_start_intwrite_trylock(struct super_block *sb)
return __sb_start_write_trylock(sb, SB_FREEZE_FS);
}
+/**
+ * sb_is_frozen - is superblock frozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is frozen.
+ */
+static inline bool sb_is_frozen(struct super_block *sb)
+{
+ return sb->s_writers.frozen == SB_FREEZE_COMPLETE;
+}
+
+/**
+ * sb_is_unfrozen - is superblock unfrozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is unfrozen.
+ */
+static inline bool sb_is_unfrozen(struct super_block *sb)
+{
+ return sb->s_writers.frozen == SB_UNFROZEN;
+}
+
bool inode_owner_or_capable(struct mnt_idmap *idmap,
const struct inode *inode);
--
2.47.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 2/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()
2025-03-26 11:22 [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
2025-03-26 11:22 ` [RFC 1/6] fs: add frozen sb state helpers Luis Chamberlain
@ 2025-03-26 11:22 ` Luis Chamberlain
2025-03-26 13:17 ` Christian Brauner
2025-03-26 11:22 ` [RFC 3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing Luis Chamberlain
` (4 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Luis Chamberlain @ 2025-03-26 11:22 UTC (permalink / raw)
To: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, Luis Chamberlain
There are use cases where we wish to traverse the superblock list
but also capture errors, and in which case we want to avoid having
our callers issue a lock themselves since we can do the locking for
the callers. Provide a iterate_supers_excl() which calls a function
with the write lock held. If an error occurs we capture it and
propagate it.
Likewise there are use cases where we wish to traverse the superblock
list but in reverse order. The new iterate_supers_reverse_excl() helpers
does this but also also captures any errors encountered.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
fs/super.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 2 +
2 files changed, 93 insertions(+)
diff --git a/fs/super.c b/fs/super.c
index 117bd1bfe09f..9995546cf159 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -945,6 +945,97 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
spin_unlock(&sb_lock);
}
+/**
+ * iterate_supers_excl - exclusively call func for all active superblocks
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument. Returns 0 unless an error
+ * occurred on calling the function on any superblock.
+ */
+int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
+{
+ struct super_block *sb, *p = NULL;
+ int error = 0;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (hlist_unhashed(&sb->s_instances))
+ continue;
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ down_write(&sb->s_umount);
+ if (sb->s_root && (sb->s_flags & SB_BORN)) {
+ error = f(sb, arg);
+ if (error) {
+ up_write(&sb->s_umount);
+ spin_lock(&sb_lock);
+ __put_super(sb);
+ break;
+ }
+ }
+ up_write(&sb->s_umount);
+
+ spin_lock(&sb_lock);
+ if (p)
+ __put_super(p);
+ p = sb;
+ }
+ if (p)
+ __put_super(p);
+ spin_unlock(&sb_lock);
+
+ return error;
+}
+
+/**
+ * iterate_supers_reverse_excl - exclusively calls func in reverse order
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument, in reverse order, and holding
+ * the s_umount write lock. Returns if an error occurred.
+ */
+int iterate_supers_reverse_excl(int (*f)(struct super_block *, void *),
+ void *arg)
+{
+ struct super_block *sb, *p = NULL;
+ int error = 0;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+ if (hlist_unhashed(&sb->s_instances))
+ continue;
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ down_write(&sb->s_umount);
+ if (sb->s_root && (sb->s_flags & SB_BORN)) {
+ error = f(sb, arg);
+ if (error) {
+ up_write(&sb->s_umount);
+ spin_lock(&sb_lock);
+ __put_super(sb);
+ break;
+ }
+ }
+ up_write(&sb->s_umount);
+
+ spin_lock(&sb_lock);
+ if (p)
+ __put_super(p);
+ p = sb;
+ }
+ if (p)
+ __put_super(p);
+ spin_unlock(&sb_lock);
+
+ return error;
+}
+
/**
* iterate_supers_type - call function for superblocks of given type
* @type: fs type
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1d9a9c557e1a..da17fd74961c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3538,6 +3538,8 @@ extern struct file_system_type *get_fs_type(const char *name);
extern void drop_super(struct super_block *sb);
extern void drop_super_exclusive(struct super_block *sb);
extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg);
+extern int iterate_supers_reverse_excl(int (*)(struct super_block *, void *), void *);
extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
--
2.47.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 2/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()
2025-03-26 11:22 ` [RFC 2/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl() Luis Chamberlain
@ 2025-03-26 13:17 ` Christian Brauner
0 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2025-03-26 13:17 UTC (permalink / raw)
To: Luis Chamberlain
Cc: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song,
linux-fsdevel, linux-kernel, gost.dev
On Wed, Mar 26, 2025 at 04:22:16AM -0700, Luis Chamberlain wrote:
> There are use cases where we wish to traverse the superblock list
> but also capture errors, and in which case we want to avoid having
> our callers issue a lock themselves since we can do the locking for
> the callers. Provide a iterate_supers_excl() which calls a function
> with the write lock held. If an error occurs we capture it and
> propagate it.
>
> Likewise there are use cases where we wish to traverse the superblock
> list but in reverse order. The new iterate_supers_reverse_excl() helpers
> does this but also also captures any errors encountered.
>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> fs/super.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 2 +
> 2 files changed, 93 insertions(+)
>
> diff --git a/fs/super.c b/fs/super.c
> index 117bd1bfe09f..9995546cf159 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -945,6 +945,97 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
> spin_unlock(&sb_lock);
> }
>
> +/**
> + * iterate_supers_excl - exclusively call func for all active superblocks
> + * @f: function to call
> + * @arg: argument to pass to it
> + *
> + * Scans the superblock list and calls given function, passing it
> + * locked superblock and given argument. Returns 0 unless an error
> + * occurred on calling the function on any superblock.
> + */
> +int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
> +{
> + struct super_block *sb, *p = NULL;
> + int error = 0;
> +
> + spin_lock(&sb_lock);
> + list_for_each_entry(sb, &super_blocks, s_list) {
> + if (hlist_unhashed(&sb->s_instances))
> + continue;
> + sb->s_count++;
> + spin_unlock(&sb_lock);
> +
> + down_write(&sb->s_umount);
> + if (sb->s_root && (sb->s_flags & SB_BORN)) {
> + error = f(sb, arg);
> + if (error) {
> + up_write(&sb->s_umount);
> + spin_lock(&sb_lock);
> + __put_super(sb);
> + break;
> + }
> + }
> + up_write(&sb->s_umount);
This is wrong. Both the reverse and the regular iterator need to wait
for the superblock to be born or die:
void iterate_supers_excl(void (*f)(struct super_block *, void *), void *arg)
{
struct super_block *sb, *p = NULL;
spin_lock(&sb_lock);
list_for_each_entry{_reverse}(sb, &super_blocks, s_list) {
bool locked;
sb->s_count++;
spin_unlock(&sb_lock);
locked = super_lock(sb);
if (locked) {
if (sb->s_root)
f(sb, arg);
super_unlock(sb);
}
spin_lock(&sb_lock);
if (p)
__put_super(p);
p = sb;
}
if (p)
__put_super(p);
spin_unlock(&sb_lock);
}
> +
> + spin_lock(&sb_lock);
> + if (p)
> + __put_super(p);
> + p = sb;
> + }
> + if (p)
> + __put_super(p);
> + spin_unlock(&sb_lock);
> +
> + return error;
> +}
> +
> +/**
> + * iterate_supers_reverse_excl - exclusively calls func in reverse order
> + * @f: function to call
> + * @arg: argument to pass to it
> + *
> + * Scans the superblock list and calls given function, passing it
> + * locked superblock and given argument, in reverse order, and holding
> + * the s_umount write lock. Returns if an error occurred.
> + */
> +int iterate_supers_reverse_excl(int (*f)(struct super_block *, void *),
> + void *arg)
> +{
> + struct super_block *sb, *p = NULL;
> + int error = 0;
> +
> + spin_lock(&sb_lock);
> + list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> + if (hlist_unhashed(&sb->s_instances))
> + continue;
> + sb->s_count++;
> + spin_unlock(&sb_lock);
> +
> + down_write(&sb->s_umount);
> + if (sb->s_root && (sb->s_flags & SB_BORN)) {
> + error = f(sb, arg);
> + if (error) {
> + up_write(&sb->s_umount);
> + spin_lock(&sb_lock);
> + __put_super(sb);
> + break;
> + }
> + }
> + up_write(&sb->s_umount);
> +
> + spin_lock(&sb_lock);
> + if (p)
> + __put_super(p);
> + p = sb;
> + }
> + if (p)
> + __put_super(p);
> + spin_unlock(&sb_lock);
> +
> + return error;
> +}
> +
> /**
> * iterate_supers_type - call function for superblocks of given type
> * @type: fs type
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1d9a9c557e1a..da17fd74961c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3538,6 +3538,8 @@ extern struct file_system_type *get_fs_type(const char *name);
> extern void drop_super(struct super_block *sb);
> extern void drop_super_exclusive(struct super_block *sb);
> extern void iterate_supers(void (*)(struct super_block *, void *), void *);
> +extern int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg);
> +extern int iterate_supers_reverse_excl(int (*)(struct super_block *, void *), void *);
> extern void iterate_supers_type(struct file_system_type *,
> void (*)(struct super_block *, void *), void *);
>
> --
> 2.47.2
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC 3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing
2025-03-26 11:22 [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
2025-03-26 11:22 ` [RFC 1/6] fs: add frozen sb state helpers Luis Chamberlain
2025-03-26 11:22 ` [RFC 2/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl() Luis Chamberlain
@ 2025-03-26 11:22 ` Luis Chamberlain
2025-03-26 11:53 ` James Bottomley
2025-03-26 11:22 ` [RFC 4/6] ext4: replace kthread freezing with auto fs freezing Luis Chamberlain
` (3 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Luis Chamberlain @ 2025-03-26 11:22 UTC (permalink / raw)
To: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, Luis Chamberlain
Add support to automatically handle freezing and thawing filesystems
during the kernel's suspend/resume cycle.
This is needed so that we properly really stop IO in flight without
races after userspace has been frozen. Without this we rely on
kthread freezing and its semantics are loose and error prone.
For instance, even though a kthread may use try_to_freeze() and end
up being frozen we have no way of being sure that everything that
has been spawned asynchronously from it (such as timers) have also
been stopped as well.
A long term advantage of also adding filesystem freeze / thawing
supporting during suspend / hibernation is that long term we may
be able to eventually drop the kernel's thread freezing completely
as it was originally added to stop disk IO in flight as we hibernate
or suspend.
This does not remove the superfluous freezer calls on all filesystems.
Each filesystem must remove all the kthread freezer stuff and peg
the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
flag.
Subsequent patches remove the kthread freezer usage from each
filesystem, one at a time to make all this work bisectable.
Once all filesystems remove the usage of the kthread freezer we
can remove the FS_AUTOFREEZE flag.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
fs/super.c | 50 ++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 14 ++++++++++++
kernel/power/process.c | 15 ++++++++++++-
3 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c
index 9995546cf159..7428f0b2251c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block *sb)
return 0;
}
EXPORT_SYMBOL_GPL(sb_init_dio_done_wq);
+
+#ifdef CONFIG_PM_SLEEP
+static bool super_should_freeze(struct super_block *sb)
+{
+ if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
+ return false;
+ /*
+ * We don't freeze virtual filesystems, we skip those filesystems with
+ * no backing device.
+ */
+ if (sb->s_bdi == &noop_backing_dev_info)
+ return false;
+
+ return true;
+}
+
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+ int error = 0;
+
+ if (!super_should_freeze(sb))
+ goto out;
+
+ pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
+
+ error = freeze_super(sb, false);
+ if (error && error != -EBUSY)
+ pr_notice("%s (%s): Unable to freeze, error=%d",
+ sb->s_type->name, sb->s_id, error);
+out:
+ return error;
+}
+
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+ int error = 0;
+
+ if (!super_should_freeze(sb))
+ goto out;
+
+ pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
+
+ error = thaw_super(sb, false);
+ if (error && error != -EBUSY)
+ pr_notice("%s (%s): Unable to unfreeze, error=%d",
+ sb->s_type->name, sb->s_id, error);
+out:
+ return error;
+}
+#endif /* CONFIG_PM_SLEEP */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index da17fd74961c..e0614c3d376e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2638,6 +2638,7 @@ struct file_system_type {
#define FS_MGTIME 64 /* FS uses multigrain timestamps */
#define FS_LBS 128 /* FS supports LBS */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
+#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;
struct dentry *(*mount) (struct file_system_type *, int,
@@ -2729,6 +2730,19 @@ extern int user_statfs(const char __user *, struct kstatfs *);
extern int fd_statfs(int, struct kstatfs *);
int freeze_super(struct super_block *super, enum freeze_holder who);
int thaw_super(struct super_block *super, enum freeze_holder who);
+#ifdef CONFIG_PM_SLEEP
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv);
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv);
+#else
+static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+ return 0;
+}
+static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+ return 0;
+}
+#endif
extern __printf(2, 3)
int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
extern int super_setup_bdi(struct super_block *sb);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 66ac067d9ae6..d0f540a89c39 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -140,6 +140,16 @@ int freeze_processes(void)
BUG_ON(in_atomic());
+ pr_info("Freezing filesystems ... ");
+ error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
+ if (error) {
+ pr_cont("failed\n");
+ iterate_supers_excl(fs_suspend_thaw_sb, NULL);
+ thaw_processes();
+ return error;
+ }
+ pr_cont("done.\n");
+
/*
* Now that the whole userspace is frozen we need to disable
* the OOM killer to disallow any further interference with
@@ -149,8 +159,10 @@ int freeze_processes(void)
if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
error = -EBUSY;
- if (error)
+ if (error) {
+ iterate_supers_excl(fs_suspend_thaw_sb, NULL);
thaw_processes();
+ }
return error;
}
@@ -188,6 +200,7 @@ void thaw_processes(void)
pm_nosig_freezing = false;
oom_killer_enable();
+ iterate_supers_excl(fs_suspend_thaw_sb, NULL);
pr_info("Restarting tasks ... ");
--
2.47.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing
2025-03-26 11:22 ` [RFC 3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing Luis Chamberlain
@ 2025-03-26 11:53 ` James Bottomley
2025-03-26 14:09 ` Christian Brauner
0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2025-03-26 11:53 UTC (permalink / raw)
To: Luis Chamberlain, jack, hch, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev
On Wed, 2025-03-26 at 04:22 -0700, Luis Chamberlain wrote:
> Add support to automatically handle freezing and thawing filesystems
> during the kernel's suspend/resume cycle.
>
> This is needed so that we properly really stop IO in flight without
> races after userspace has been frozen. Without this we rely on
> kthread freezing and its semantics are loose and error prone.
> For instance, even though a kthread may use try_to_freeze() and end
> up being frozen we have no way of being sure that everything that
> has been spawned asynchronously from it (such as timers) have also
> been stopped as well.
>
> A long term advantage of also adding filesystem freeze / thawing
> supporting during suspend / hibernation is that long term we may
> be able to eventually drop the kernel's thread freezing completely
> as it was originally added to stop disk IO in flight as we hibernate
> or suspend.
>
> This does not remove the superfluous freezer calls on all
> filesystems.
> Each filesystem must remove all the kthread freezer stuff and peg
> the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> flag.
>
> Subsequent patches remove the kthread freezer usage from each
> filesystem, one at a time to make all this work bisectable.
> Once all filesystems remove the usage of the kthread freezer we
> can remove the FS_AUTOFREEZE flag.
>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> fs/super.c | 50
> ++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 14 ++++++++++++
> kernel/power/process.c | 15 ++++++++++++-
> 3 files changed, 78 insertions(+), 1 deletion(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index 9995546cf159..7428f0b2251c 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block
> *sb)
> return 0;
> }
> EXPORT_SYMBOL_GPL(sb_init_dio_done_wq);
> +
> +#ifdef CONFIG_PM_SLEEP
> +static bool super_should_freeze(struct super_block *sb)
> +{
> + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
> + return false;
> + /*
> + * We don't freeze virtual filesystems, we skip those
> filesystems with
> + * no backing device.
> + */
> + if (sb->s_bdi == &noop_backing_dev_info)
> + return false;
This logic won't work for me because efivarfs is a pseudofilesystem and
will have a noop bdi (or simply a null s_bdev, which is easier to check
for). I was thinking of allowing freeze/thaw to continue for a s_bdev
== NULL filesystem if it provided a freeze or thaw callback, which will
cover efivarfs.
> +
> + return true;
> +}
> +
> +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> +{
> + int error = 0;
> +
> + if (!super_should_freeze(sb))
> + goto out;
> +
> + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> +
> + error = freeze_super(sb, false);
This is actually not wholly correct now. If the fs provides a sb-
>freeze() method, you should use that instead of freeze_super() ... see
how fs_bdev_freeze() is doing it.
Additionally, the first thing freeze_super() does is take the
superblock lock exclusively. Since you've already taken it exclusively
in your iterate super, how does this not deadlock?
You also need to handle the hibernate deadlock I ran into where a
process (and some of the systemd processes are very fast at doing this)
touches the filesystem and gets blocked on uninterruptible wait before
the remainder of freeze_processes() runs. Once a task is
uninterruptible hibernate fails. I came up with a simplistic solution:
https://lore.kernel.org/linux-fsdevel/1af829aa7a65eb5ebc0614a00f7019615ed0f62b.camel@HansenPartnership.com/
But there should probably be a freezable percpu_rwsem that
sb_write_started() can use to get these semantics rather than making
every use of percpu_rwsem freezable.
Regards,
James
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing
2025-03-26 11:53 ` James Bottomley
@ 2025-03-26 14:09 ` Christian Brauner
2025-03-26 14:37 ` James Bottomley
0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2025-03-26 14:09 UTC (permalink / raw)
To: James Bottomley, Luis Chamberlain
Cc: jack, hch, david, rafael, djwong, pavel, song, linux-fsdevel,
linux-kernel, gost.dev
On Wed, Mar 26, 2025 at 07:53:10AM -0400, James Bottomley wrote:
> On Wed, 2025-03-26 at 04:22 -0700, Luis Chamberlain wrote:
> > Add support to automatically handle freezing and thawing filesystems
> > during the kernel's suspend/resume cycle.
> >
> > This is needed so that we properly really stop IO in flight without
> > races after userspace has been frozen. Without this we rely on
> > kthread freezing and its semantics are loose and error prone.
> > For instance, even though a kthread may use try_to_freeze() and end
> > up being frozen we have no way of being sure that everything that
> > has been spawned asynchronously from it (such as timers) have also
> > been stopped as well.
> >
> > A long term advantage of also adding filesystem freeze / thawing
> > supporting during suspend / hibernation is that long term we may
> > be able to eventually drop the kernel's thread freezing completely
> > as it was originally added to stop disk IO in flight as we hibernate
> > or suspend.
> >
> > This does not remove the superfluous freezer calls on all
> > filesystems.
> > Each filesystem must remove all the kthread freezer stuff and peg
> > the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> > flag.
> >
> > Subsequent patches remove the kthread freezer usage from each
> > filesystem, one at a time to make all this work bisectable.
> > Once all filesystems remove the usage of the kthread freezer we
> > can remove the FS_AUTOFREEZE flag.
> >
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> > fs/super.c | 50
> > ++++++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 14 ++++++++++++
> > kernel/power/process.c | 15 ++++++++++++-
> > 3 files changed, 78 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/super.c b/fs/super.c
> > index 9995546cf159..7428f0b2251c 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -2279,3 +2279,53 @@ int sb_init_dio_done_wq(struct super_block
> > *sb)
> > return 0;
> > }
> > EXPORT_SYMBOL_GPL(sb_init_dio_done_wq);
> > +
> > +#ifdef CONFIG_PM_SLEEP
> > +static bool super_should_freeze(struct super_block *sb)
> > +{
> > + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
> > + return false;
> > + /*
> > + * We don't freeze virtual filesystems, we skip those
> > filesystems with
> > + * no backing device.
> > + */
> > + if (sb->s_bdi == &noop_backing_dev_info)
> > + return false;
>
>
> This logic won't work for me because efivarfs is a pseudofilesystem and
> will have a noop bdi (or simply a null s_bdev, which is easier to check
> for). I was thinking of allowing freeze/thaw to continue for a s_bdev
> == NULL filesystem if it provided a freeze or thaw callback, which will
> cover efivarfs.
Filesystem freezing isn't dependent on backing devices. I'm not sure
where that impression comes from. The FS_AUTOFREEZE shouldn't be
necessary once all filesystems have been fixed up (which I guess this is
about). The logic should just be similar to what we do for the freeze
ioctl.
IOW, we skip filesystems without any freeze method. That excludes any fs
that isn't prepared to be frozen:
The easiest way is very likely to give efivarfs a ->freeze_super() and
->thaw_super() method since it likely doesn't all of the fanciness that
freeze_super() adds.
Then we have two approaches:
(1) Change the iterator to take a reference while holding the super_lock() and
then calling a helper to freeze the fs.
(2) Pass the information that s_umount is held down to the freeze methods.
For example (2) would be something like:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index be3ad155ec9f..7ad515ad6934 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2272,6 +2272,7 @@ enum freeze_holder {
FREEZE_HOLDER_KERNEL = (1U << 0),
FREEZE_HOLDER_USERSPACE = (1U << 1),
FREEZE_MAY_NEST = (1U << 2),
+ FREEZE_SUPER_LOCKED = (1U << 3),
};
struct super_operations {
static int freeze_super_locked(struct file *filp)
{
/* If filesystem doesn't support freeze feature, return. */
if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL)
return 0;
if (sb->s_op->freeze_super)
return sb->s_op->freeze_super(sb, FREEZE_HOLDER_KERNEL | FREEZE_SUPER_LOCKED);
return freeze_super(sb, FREEZE_HOLDER_KERNEL | FREEZE_SUPER_LOCKED);
}
Why do you care about efivarfs taking part in system suspend though?
>
> > +
> > + return true;
> > +}
> > +
> > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
> > +{
> > + int error = 0;
> > +
> > + if (!super_should_freeze(sb))
> > + goto out;
> > +
> > + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> > +
> > + error = freeze_super(sb, false);
>
> This is actually not wholly correct now. If the fs provides a sb-
> >freeze() method, you should use that instead of freeze_super() ... see
> how fs_bdev_freeze() is doing it.
>
> Additionally, the first thing freeze_super() does is take the
> superblock lock exclusively. Since you've already taken it exclusively
> in your iterate super, how does this not deadlock?
It will deadlock.
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing
2025-03-26 14:09 ` Christian Brauner
@ 2025-03-26 14:37 ` James Bottomley
0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2025-03-26 14:37 UTC (permalink / raw)
To: Christian Brauner, Luis Chamberlain
Cc: jack, hch, david, rafael, djwong, pavel, song, linux-fsdevel,
linux-kernel, gost.dev
On Wed, 2025-03-26 at 15:09 +0100, Christian Brauner wrote:
[...]
> Why do you care about efivarfs taking part in system suspend though?
I don't, I only care about intercepting thaw. If ->thaw_super can get
called on resume without me having to provide a freeze_super, then I'm
happy.
I'd still like to know whether the thaw is for suspend or hibernate,
though.
Regards,
James
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC 4/6] ext4: replace kthread freezing with auto fs freezing
2025-03-26 11:22 [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
` (2 preceding siblings ...)
2025-03-26 11:22 ` [RFC 3/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing Luis Chamberlain
@ 2025-03-26 11:22 ` Luis Chamberlain
2025-03-26 17:57 ` Jan Kara
2025-03-26 11:22 ` [RFC 5/6] btrfs: " Luis Chamberlain
` (2 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Luis Chamberlain @ 2025-03-26 11:22 UTC (permalink / raw)
To: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, Luis Chamberlain
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superfluous
freezer calls:
make coccicheck MODE=patch SPFLAGS="--in-place --no-show-diff" COCCI=./fs-freeze-cleanup.cocci M=fs/ext4
virtual patch
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
fs/ext4/mballoc.c | 2 +-
fs/ext4/super.c | 9 +++------
2 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 0d523e9fb3d5..ae235ec5ff3a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6782,7 +6782,7 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
static bool ext4_trim_interrupted(void)
{
- return fatal_signal_pending(current) || freezing(current);
+ return fatal_signal_pending(current);
}
static int ext4_try_to_trim_range(struct super_block *sb,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8cafcd3e9f5f..4241043262c8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -136,7 +136,7 @@ static struct file_system_type ext2_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = ext4_kill_sb,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("ext2");
MODULE_ALIAS("ext2");
@@ -152,7 +152,7 @@ static struct file_system_type ext3_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = ext4_kill_sb,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
@@ -3776,7 +3776,6 @@ static int ext4_lazyinit_thread(void *arg)
unsigned long next_wakeup, cur;
BUG_ON(NULL == eli);
- set_freezable();
cont_thread:
while (true) {
@@ -3835,8 +3834,6 @@ static int ext4_lazyinit_thread(void *arg)
}
mutex_unlock(&eli->li_list_mtx);
- try_to_freeze();
-
cur = jiffies;
if (!next_wakeup_initialized || time_after_eq(cur, next_wakeup)) {
cond_resched();
@@ -7404,7 +7401,7 @@ static struct file_system_type ext4_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = ext4_kill_sb,
- .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("ext4");
--
2.47.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 4/6] ext4: replace kthread freezing with auto fs freezing
2025-03-26 11:22 ` [RFC 4/6] ext4: replace kthread freezing with auto fs freezing Luis Chamberlain
@ 2025-03-26 17:57 ` Jan Kara
0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2025-03-26 17:57 UTC (permalink / raw)
To: Luis Chamberlain
Cc: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song,
linux-fsdevel, linux-kernel, gost.dev
On Wed 26-03-25 04:22:18, Luis Chamberlain wrote:
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 0d523e9fb3d5..ae235ec5ff3a 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6782,7 +6782,7 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb,
>
> static bool ext4_trim_interrupted(void)
> {
> - return fatal_signal_pending(current) || freezing(current);
> + return fatal_signal_pending(current);
> }
I think this is wrong. ext4_trim_interrupted() gets called from a normal
process that's doing fstrim (which can take a long time and we don't want
to block system suspend with it). So IMO this should stay as is.
Otherwise the patch looks good to me.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC 5/6] btrfs: replace kthread freezing with auto fs freezing
2025-03-26 11:22 [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
` (3 preceding siblings ...)
2025-03-26 11:22 ` [RFC 4/6] ext4: replace kthread freezing with auto fs freezing Luis Chamberlain
@ 2025-03-26 11:22 ` Luis Chamberlain
2025-03-26 11:22 ` [RFC 6/6] xfs: " Luis Chamberlain
2025-03-26 11:42 ` [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
6 siblings, 0 replies; 14+ messages in thread
From: Luis Chamberlain @ 2025-03-26 11:22 UTC (permalink / raw)
To: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, Luis Chamberlain
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superfluous
freezer calls:
make coccicheck MODE=patch SPFLAGS="--in-place --no-show-diff" COCCI=./fs-freeze-cleanup.cocci M=fs/btrfs
virtual patch
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
fs/btrfs/disk-io.c | 4 ++--
fs/btrfs/scrub.c | 2 +-
fs/btrfs/super.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3dd555db3d32..03332f914be7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1962,8 +1962,8 @@ static void btrfs_init_qgroup(struct btrfs_fs_info *fs_info)
static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
{
u32 max_active = fs_info->thread_pool_size;
- unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
- unsigned int ordered_flags = WQ_MEM_RECLAIM | WQ_FREEZABLE;
+ unsigned int flags = WQ_MEM_RECLAIM | WQ_UNBOUND;
+ unsigned int ordered_flags = WQ_MEM_RECLAIM;
fs_info->workers =
btrfs_alloc_workqueue(fs_info, "worker", flags, max_active, 16);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ae34353a34d9..52ef84923645 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2811,7 +2811,7 @@ static void scrub_workers_put(struct btrfs_fs_info *fs_info)
static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info)
{
struct workqueue_struct *scrub_workers = NULL;
- unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND;
+ unsigned int flags = WQ_UNBOUND;
int max_active = fs_info->thread_pool_size;
int ret = -ENOMEM;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 40709e2a44fc..153e8a2d7fbb 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2178,7 +2178,7 @@ static struct file_system_type btrfs_fs_type = {
.parameters = btrfs_fs_parameters,
.kill_sb = btrfs_kill_super,
.fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA |
- FS_ALLOW_IDMAP | FS_MGTIME,
+ FS_ALLOW_IDMAP | FS_MGTIME | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("btrfs");
--
2.47.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC 6/6] xfs: replace kthread freezing with auto fs freezing
2025-03-26 11:22 [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
` (4 preceding siblings ...)
2025-03-26 11:22 ` [RFC 5/6] btrfs: " Luis Chamberlain
@ 2025-03-26 11:22 ` Luis Chamberlain
2025-03-26 11:42 ` [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
6 siblings, 0 replies; 14+ messages in thread
From: Luis Chamberlain @ 2025-03-26 11:22 UTC (permalink / raw)
To: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, Luis Chamberlain
The kernel power management now supports allowing the VFS
to handle filesystem freezing freezes and thawing. Take advantage
of that and remove the kthread freezing. This is needed so that we
properly really stop IO in flight without races after userspace
has been frozen. Without this we rely on kthread freezing and
its semantics are loose and error prone.
The filesystem therefore is in charge of properly dealing with
quiescing of the filesystem through its callbacks if it thinks
it knows better than how the VFS handles it.
The following Coccinelle rule was used as to remove the now superfluous
freezer calls:
make coccicheck MODE=patch SPFLAGS="--in-place --no-show-diff" COCCI=./fs-freeze-cleanup.cocci M=fs/xfs
virtual patch
@ remove_set_freezable @
expression time;
statement S, S2;
expression task, current;
@@
(
- set_freezable();
|
- if (try_to_freeze())
- continue;
|
- try_to_freeze();
|
- freezable_schedule();
+ schedule();
|
- freezable_schedule_timeout(time);
+ schedule_timeout(time);
|
- if (freezing(task)) { S }
|
- if (freezing(task)) { S }
- else
{ S2 }
|
- freezing(current)
)
@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@
(
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE,
+ WQ_ARG2,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+ WQ_ARG2 | WQ_ARG3,
...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
- WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+ WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
...);
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE
+ WQ_ARG1
|
WQ_E =
- WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+ WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+ WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
- WQ_FREEZABLE | WQ_ARG2
+ WQ_ARG2
)
|
fs_wq_fn(
- WQ_FREEZABLE
+ 0
)
)
@ add_auto_flag @
expression E1;
identifier fs_type;
@@
struct file_system_type fs_type = {
.fs_flags = E1
+ | FS_AUTOFREEZE
,
};
Generated-by: Coccinelle SmPL
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
fs/xfs/xfs_discard.c | 2 +-
fs/xfs/xfs_log.c | 3 +--
fs/xfs/xfs_log_cil.c | 2 +-
fs/xfs/xfs_mru_cache.c | 2 +-
fs/xfs/xfs_pwork.c | 2 +-
fs/xfs/xfs_super.c | 16 ++++++++--------
fs/xfs/xfs_trans_ail.c | 3 ---
fs/xfs/xfs_zone_gc.c | 2 --
8 files changed, 13 insertions(+), 19 deletions(-)
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index c1a306268ae4..1596cf0ecb9b 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -333,7 +333,7 @@ xfs_trim_gather_extents(
static bool
xfs_trim_should_stop(void)
{
- return fatal_signal_pending(current) || freezing(current);
+ return fatal_signal_pending(current);
}
/*
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 6493bdb57351..317f6db292fb 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1489,8 +1489,7 @@ xlog_alloc_log(
log->l_iclog->ic_prev = prev_iclog; /* re-write 1st prev ptr */
log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM |
- WQ_HIGHPRI),
+ XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_HIGHPRI),
0, mp->m_super->s_id);
if (!log->l_ioend_workqueue)
goto out_free_iclog;
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 1ca406ec1b40..8ff5d68394e6 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -1932,7 +1932,7 @@ xlog_cil_init(
* concurrency the log spinlocks will be exposed to.
*/
cil->xc_push_wq = alloc_workqueue("xfs-cil/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
+ XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_UNBOUND),
4, log->l_mp->m_super->s_id);
if (!cil->xc_push_wq)
goto out_destroy_cil;
diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index d0f5b403bdbe..c9a49c6f6129 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -293,7 +293,7 @@ int
xfs_mru_cache_init(void)
{
xfs_mru_reap_wq = alloc_workqueue("xfs_mru_cache",
- XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 1);
+ XFS_WQFLAGS(WQ_MEM_RECLAIM), 1);
if (!xfs_mru_reap_wq)
return -ENOMEM;
return 0;
diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
index c283b801cc5d..3f5bf53f8778 100644
--- a/fs/xfs/xfs_pwork.c
+++ b/fs/xfs/xfs_pwork.c
@@ -72,7 +72,7 @@ xfs_pwork_init(
trace_xfs_pwork_init(mp, nr_threads, current->pid);
pctl->wq = alloc_workqueue("%s-%d",
- WQ_UNBOUND | WQ_SYSFS | WQ_FREEZABLE, nr_threads, tag,
+ WQ_UNBOUND | WQ_SYSFS, nr_threads, tag,
current->pid);
if (!pctl->wq)
return -ENOMEM;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index b2dd0c0bf509..4fae48072ef3 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -565,37 +565,37 @@ xfs_init_mount_workqueues(
struct xfs_mount *mp)
{
mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+ XFS_WQFLAGS(WQ_MEM_RECLAIM),
1, mp->m_super->s_id);
if (!mp->m_buf_workqueue)
goto out;
mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+ XFS_WQFLAGS(WQ_MEM_RECLAIM),
0, mp->m_super->s_id);
if (!mp->m_unwritten_workqueue)
goto out_destroy_buf;
mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+ XFS_WQFLAGS(WQ_MEM_RECLAIM),
0, mp->m_super->s_id);
if (!mp->m_reclaim_workqueue)
goto out_destroy_unwritten;
mp->m_blockgc_wq = alloc_workqueue("xfs-blockgc/%s",
- XFS_WQFLAGS(WQ_UNBOUND | WQ_FREEZABLE | WQ_MEM_RECLAIM),
+ XFS_WQFLAGS(WQ_UNBOUND | WQ_MEM_RECLAIM),
0, mp->m_super->s_id);
if (!mp->m_blockgc_wq)
goto out_destroy_reclaim;
mp->m_inodegc_wq = alloc_workqueue("xfs-inodegc/%s",
- XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+ XFS_WQFLAGS(WQ_MEM_RECLAIM),
1, mp->m_super->s_id);
if (!mp->m_inodegc_wq)
goto out_destroy_blockgc;
mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
- XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id);
+ XFS_WQFLAGS(0), 0, mp->m_super->s_id);
if (!mp->m_sync_workqueue)
goto out_destroy_inodegc;
@@ -2228,7 +2228,7 @@ static struct file_system_type xfs_fs_type = {
.parameters = xfs_fs_parameters,
.kill_sb = xfs_kill_sb,
.fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
- FS_LBS,
+ FS_LBS | FS_AUTOFREEZE,
};
MODULE_ALIAS_FS("xfs");
@@ -2500,7 +2500,7 @@ xfs_init_workqueues(void)
* max_active value for this workqueue.
*/
xfs_alloc_wq = alloc_workqueue("xfsalloc",
- XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 0);
+ XFS_WQFLAGS(WQ_MEM_RECLAIM), 0);
if (!xfs_alloc_wq)
return -ENOMEM;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 0fcb1828e598..ad8183db0780 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -636,7 +636,6 @@ xfsaild(
unsigned int noreclaim_flag;
noreclaim_flag = memalloc_noreclaim_save();
- set_freezable();
while (1) {
/*
@@ -695,8 +694,6 @@ xfsaild(
__set_current_state(TASK_RUNNING);
- try_to_freeze();
-
tout = xfsaild_push(ailp);
}
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index c5136ea9bb1d..1875b6551ab0 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -993,7 +993,6 @@ xfs_zone_gc_handle_work(
}
__set_current_state(TASK_RUNNING);
- try_to_freeze();
if (reset_list)
xfs_zone_gc_reset_zones(data, reset_list);
@@ -1041,7 +1040,6 @@ xfs_zoned_gcd(
unsigned int nofs_flag;
nofs_flag = memalloc_nofs_save();
- set_freezable();
for (;;) {
set_current_state(TASK_INTERRUPTIBLE | TASK_FREEZABLE);
--
2.47.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 0/6] fs: automatic kernel fs freeze / thaw
2025-03-26 11:22 [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
` (5 preceding siblings ...)
2025-03-26 11:22 ` [RFC 6/6] xfs: " Luis Chamberlain
@ 2025-03-26 11:42 ` Luis Chamberlain
2025-03-26 12:27 ` James Bottomley
6 siblings, 1 reply; 14+ messages in thread
From: Luis Chamberlain @ 2025-03-26 11:42 UTC (permalink / raw)
To: jack, hch, James.Bottomley, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, amir73il
On Wed, Mar 26, 2025 at 04:22:14AM -0700, Luis Chamberlain wrote:
> I did a quick boot test with this on my laptop and suspend doesn't work,
> its not clear if this was an artifact of me trying this on linux-next or
> what, I can try without my patches on next to see if next actually
> suspends without them. And so, we gotta figure out if there's something
> stupid still to fix, or something broken with these changes I overlooked
> on the rebase.
next-20250321 has suspend broken, so it was not my patches which broke
suspend. So we need baseline first on a kernel revision where it is not
broken.
Luis
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 0/6] fs: automatic kernel fs freeze / thaw
2025-03-26 11:42 ` [RFC 0/6] fs: automatic kernel fs freeze / thaw Luis Chamberlain
@ 2025-03-26 12:27 ` James Bottomley
0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2025-03-26 12:27 UTC (permalink / raw)
To: Luis Chamberlain, jack, hch, david, rafael, djwong, pavel, song
Cc: linux-fsdevel, linux-kernel, gost.dev, amir73il
On Wed, 2025-03-26 at 04:42 -0700, Luis Chamberlain wrote:
> On Wed, Mar 26, 2025 at 04:22:14AM -0700, Luis Chamberlain wrote:
> > I did a quick boot test with this on my laptop and suspend doesn't
> > work, its not clear if this was an artifact of me trying this on
> > linux-next or what, I can try without my patches on next to see if
> > next actually suspends without them. And so, we gotta figure out if
> > there's something stupid still to fix, or something broken with
> > these changes I overlooked on the rebase.
>
> next-20250321 has suspend broken, so it was not my patches which
> broke suspend. So we need baseline first on a kernel revision where
> it is not broken.
I wrote a much lighter weight version of this and tested it on 6.14 (or
actually 6.14-rc6 since my laptop takes ages to do a full kernel
compile), which is where I found the sb_write_started() deadlock Jan
predicted. But with that fixed, at least hibernate works for me using
an ext4 based image, so if you use 6.14 as your base, it should work
for you.
Regards,
James
^ permalink raw reply [flat|nested] 14+ messages in thread