* [PATCH v1] ext4: fix use-after-free in update_super_work when racing with umount
@ 2026-03-13 6:52 Jiayuan Chen
2026-03-17 11:38 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2026-03-13 6:52 UTC (permalink / raw)
To: linux-ext4
Cc: Jiayuan Chen, Theodore Ts'o, Andreas Dilger, Jan Kara,
Ritesh Harjani, Ye Bin, linux-kernel
Commit b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
moved ext4_unregister_sysfs() before flushing s_sb_upd_work to prevent new
error work from being queued via /proc/fs/ext4/xx/mb_groups reads during
unmount. However, this introduced a use-after-free because
update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which
accesses the kobject's kernfs_node after it has been freed:
update_super_work ext4_put_super
----------------- --------------
ext4_unregister_sysfs(sb)
kobject_del(&sbi->s_kobj)
__kobject_del()
sysfs_remove_dir()
kobj->sd = NULL
sysfs_put(sd)
kernfs_put() // RCU free
ext4_notify_error_sysfs(sbi)
sysfs_notify(&sbi->s_kobj)
kn = kobj->sd // stale pointer
kernfs_get(kn) // UAF on freed kernfs_node
ext4_journal_destroy()
flush_work(&sbi->s_sb_upd_work)
The original blamed commit needed procfs removal before the work
flush to prevent /proc/fs/ext4/xx/mb_groups reads from queuing new error
work. But it bundled procfs removal and kobject_del together in
ext4_unregister_sysfs(), causing the sysfs kobject to be torn down too
early.
The correct teardown ordering has three constraints:
1. procfs removal must happen before flushing s_sb_upd_work, to prevent
/proc reads from queuing new error work that would BUG_ON.
2. sysfs kobject removal must happen after flushing s_sb_upd_work, since
the work calls sysfs_notify() which accesses the kernfs_node.
3. sysfs kobject removal must happen before jbd2_journal_destroy(), since
userspace could read the journal_task sysfs attribute and dereference
j_task after the journal thread has been killed.
Fix this by:
- Adding ext4_sb_release_proc() to remove procfs entries early.
- Splitting ext4_journal_destroy() into ext4_journal_stop_work() and
ext4_journal_finish(), so that ext4_unregister_sysfs() can be placed
between them to satisfy all three ordering constraints.
Fixes: b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
fs/ext4/ext4.h | 1 +
fs/ext4/ext4_jbd2.h | 45 ++++++++++++++++++++++++++++-----------------
fs/ext4/super.c | 34 +++++++++++++++++++++++-----------
fs/ext4/sysfs.c | 10 ++++++++++
4 files changed, 62 insertions(+), 28 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b76966dc06c0..a693365d224c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3757,6 +3757,7 @@ extern const struct inode_operations ext4_fast_symlink_inode_operations;
/* sysfs.c */
extern void ext4_notify_error_sysfs(struct ext4_sb_info *sbi);
extern int ext4_register_sysfs(struct super_block *sb);
+extern void ext4_sb_release_proc(struct super_block *sb);
extern void ext4_unregister_sysfs(struct super_block *sb);
extern int __init ext4_init_sysfs(void);
extern void ext4_exit_sysfs(void);
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 63d17c5201b5..2b7d68b11578 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -430,32 +430,43 @@ static inline int ext4_should_dioread_nolock(struct inode *inode)
}
/*
- * Pass journal explicitly as it may not be cached in the sbi->s_journal in some
- * cases
+ * Stop new s_sb_upd_work from being queued and flush any pending work.
+ *
+ * At this point only two things can be operating on the journal:
+ * JBD2 thread performing transaction commit and s_sb_upd_work
+ * issuing sb update through the journal. Once we set
+ * EXT4_MF_JOURNAL_DESTROY, new ext4_handle_error() calls will not
+ * queue s_sb_upd_work and ext4_force_commit() makes sure any
+ * ext4_handle_error() calls from the running transaction commit are
+ * finished. Hence no new s_sb_upd_work can be queued after we
+ * flush it here.
*/
-static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *journal)
+static inline void ext4_journal_stop_work(struct ext4_sb_info *sbi)
{
- int err = 0;
-
- /*
- * At this point only two things can be operating on the journal.
- * JBD2 thread performing transaction commit and s_sb_upd_work
- * issuing sb update through the journal. Once we set
- * EXT4_JOURNAL_DESTROY, new ext4_handle_error() calls will not
- * queue s_sb_upd_work and ext4_force_commit() makes sure any
- * ext4_handle_error() calls from the running transaction commit are
- * finished. Hence no new s_sb_upd_work can be queued after we
- * flush it here.
- */
ext4_set_mount_flag(sbi->s_sb, EXT4_MF_JOURNAL_DESTROY);
-
ext4_force_commit(sbi->s_sb);
flush_work(&sbi->s_sb_upd_work);
+}
+
+/*
+ * Destroy the journal. Must be called after ext4_journal_stop_work().
+ * Pass journal explicitly as it may not be cached in the sbi->s_journal
+ * in some cases.
+ */
+static inline int ext4_journal_finish(struct ext4_sb_info *sbi,
+ journal_t *journal)
+{
+ int err;
err = jbd2_journal_destroy(journal);
sbi->s_journal = NULL;
-
return err;
}
+static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *journal)
+{
+ ext4_journal_stop_work(sbi);
+ return ext4_journal_finish(sbi, journal);
+}
+
#endif /* _EXT4_JBD2_H */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 752f414aa06b..9bba783f44e1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1280,16 +1280,12 @@ static void ext4_put_super(struct super_block *sb)
int err;
/*
- * Unregister sysfs before destroying jbd2 journal.
- * Since we could still access attr_journal_task attribute via sysfs
- * path which could have sbi->s_journal->j_task as NULL
- * Unregister sysfs before flush sbi->s_sb_upd_work.
- * Since user may read /proc/fs/ext4/xx/mb_groups during umount, If
- * read metadata verify failed then will queue error work.
- * update_super_work will call start_this_handle may trigger
- * BUG_ON.
+ * Remove procfs entries before flush s_sb_upd_work. Since user may
+ * read /proc/fs/ext4/xx/mb_groups during umount, if read metadata
+ * verify failed then will queue error work. update_super_work will
+ * call start_this_handle which may trigger BUG_ON.
*/
- ext4_unregister_sysfs(sb);
+ ext4_sb_release_proc(sb);
if (___ratelimit(&ext4_mount_msg_ratelimit, "EXT4-fs unmount"))
ext4_msg(sb, KERN_INFO, "unmounting filesystem %pU.",
@@ -1301,14 +1297,30 @@ static void ext4_put_super(struct super_block *sb)
destroy_workqueue(sbi->rsv_conversion_wq);
ext4_release_orphan_info(sb);
+ /*
+ * Flush s_sb_upd_work before unregistering sysfs, since
+ * update_super_work calls ext4_notify_error_sysfs() which accesses
+ * the kobject's kernfs_node via sysfs_notify(). Unregistering sysfs
+ * before the flush could lead to a use-after-free on the
+ * kernfs_node.
+ *
+ * Also unregister sysfs before destroying jbd2 journal, since
+ * userspace could read the journal_task sysfs attribute while
+ * jbd2_journal_destroy() is killing the journal thread, leading to
+ * a NULL pointer dereference of j_task in journal_task_show().
+ */
if (sbi->s_journal) {
aborted = is_journal_aborted(sbi->s_journal);
- err = ext4_journal_destroy(sbi, sbi->s_journal);
+ ext4_journal_stop_work(sbi);
+ ext4_unregister_sysfs(sb);
+ err = ext4_journal_finish(sbi, sbi->s_journal);
if ((err < 0) && !aborted) {
ext4_abort(sb, -err, "Couldn't clean up the journal");
}
- } else
+ } else {
flush_work(&sbi->s_sb_upd_work);
+ ext4_unregister_sysfs(sb);
+ }
ext4_es_unregister_shrinker(sbi);
timer_shutdown_sync(&sbi->s_err_report);
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index d2ecc1026c0c..f6947416c1e7 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -638,6 +638,16 @@ int ext4_register_sysfs(struct super_block *sb)
return 0;
}
+void ext4_sb_release_proc(struct super_block *sb)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+ if (sbi->s_proc) {
+ remove_proc_subtree(sb->s_id, ext4_proc_root);
+ sbi->s_proc = NULL;
+ }
+}
+
void ext4_unregister_sysfs(struct super_block *sb)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v1] ext4: fix use-after-free in update_super_work when racing with umount
2026-03-13 6:52 [PATCH v1] ext4: fix use-after-free in update_super_work when racing with umount Jiayuan Chen
@ 2026-03-17 11:38 ` Jan Kara
2026-03-18 5:04 ` Jiayuan Chen
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2026-03-17 11:38 UTC (permalink / raw)
To: Jiayuan Chen
Cc: linux-ext4, Theodore Ts'o, Andreas Dilger, Jan Kara,
Ritesh Harjani, Ye Bin, linux-kernel
On Fri 13-03-26 14:52:04, Jiayuan Chen wrote:
> Commit b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
> moved ext4_unregister_sysfs() before flushing s_sb_upd_work to prevent new
> error work from being queued via /proc/fs/ext4/xx/mb_groups reads during
> unmount. However, this introduced a use-after-free because
> update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which
> accesses the kobject's kernfs_node after it has been freed:
>
> update_super_work ext4_put_super
> ----------------- --------------
> ext4_unregister_sysfs(sb)
> kobject_del(&sbi->s_kobj)
> __kobject_del()
> sysfs_remove_dir()
> kobj->sd = NULL
> sysfs_put(sd)
> kernfs_put() // RCU free
> ext4_notify_error_sysfs(sbi)
> sysfs_notify(&sbi->s_kobj)
> kn = kobj->sd // stale pointer
> kernfs_get(kn) // UAF on freed kernfs_node
> ext4_journal_destroy()
> flush_work(&sbi->s_sb_upd_work)
>
> The original blamed commit needed procfs removal before the work
> flush to prevent /proc/fs/ext4/xx/mb_groups reads from queuing new error
> work. But it bundled procfs removal and kobject_del together in
> ext4_unregister_sysfs(), causing the sysfs kobject to be torn down too
> early.
>
> The correct teardown ordering has three constraints:
>
> 1. procfs removal must happen before flushing s_sb_upd_work, to prevent
> /proc reads from queuing new error work that would BUG_ON.
> 2. sysfs kobject removal must happen after flushing s_sb_upd_work, since
> the work calls sysfs_notify() which accesses the kernfs_node.
> 3. sysfs kobject removal must happen before jbd2_journal_destroy(), since
> userspace could read the journal_task sysfs attribute and dereference
> j_task after the journal thread has been killed.
>
> Fix this by:
> - Adding ext4_sb_release_proc() to remove procfs entries early.
> - Splitting ext4_journal_destroy() into ext4_journal_stop_work() and
> ext4_journal_finish(), so that ext4_unregister_sysfs() can be placed
> between them to satisfy all three ordering constraints.
>
> Fixes: b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
> Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Thanks for the analysis and the patch! I fully agree with your analysis but
I think your solution shows that the teardown sequence is just too subtle
(too many different dependencies). Ideally, we'd instead modify
ext4_notify_error_sysfs() to detect sysfs is already torn down by
ext4_unregister_sysfs() and do nothing. We can check
sbi->s_kobj.state_in_sysfs for that. The only trouble with this is that
sysfs_notify() could still race with kobject_del() so we also need some
locking in ext4_unregister_sysfs() locking out parallel
ext4_notify_error_sysfs() and we probably need to introduce a separate
mutex for that. What do you think?
Honza
> ---
> fs/ext4/ext4.h | 1 +
> fs/ext4/ext4_jbd2.h | 45 ++++++++++++++++++++++++++++-----------------
> fs/ext4/super.c | 34 +++++++++++++++++++++++-----------
> fs/ext4/sysfs.c | 10 ++++++++++
> 4 files changed, 62 insertions(+), 28 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b76966dc06c0..a693365d224c 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3757,6 +3757,7 @@ extern const struct inode_operations ext4_fast_symlink_inode_operations;
> /* sysfs.c */
> extern void ext4_notify_error_sysfs(struct ext4_sb_info *sbi);
> extern int ext4_register_sysfs(struct super_block *sb);
> +extern void ext4_sb_release_proc(struct super_block *sb);
> extern void ext4_unregister_sysfs(struct super_block *sb);
> extern int __init ext4_init_sysfs(void);
> extern void ext4_exit_sysfs(void);
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 63d17c5201b5..2b7d68b11578 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -430,32 +430,43 @@ static inline int ext4_should_dioread_nolock(struct inode *inode)
> }
>
> /*
> - * Pass journal explicitly as it may not be cached in the sbi->s_journal in some
> - * cases
> + * Stop new s_sb_upd_work from being queued and flush any pending work.
> + *
> + * At this point only two things can be operating on the journal:
> + * JBD2 thread performing transaction commit and s_sb_upd_work
> + * issuing sb update through the journal. Once we set
> + * EXT4_MF_JOURNAL_DESTROY, new ext4_handle_error() calls will not
> + * queue s_sb_upd_work and ext4_force_commit() makes sure any
> + * ext4_handle_error() calls from the running transaction commit are
> + * finished. Hence no new s_sb_upd_work can be queued after we
> + * flush it here.
> */
> -static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *journal)
> +static inline void ext4_journal_stop_work(struct ext4_sb_info *sbi)
> {
> - int err = 0;
> -
> - /*
> - * At this point only two things can be operating on the journal.
> - * JBD2 thread performing transaction commit and s_sb_upd_work
> - * issuing sb update through the journal. Once we set
> - * EXT4_JOURNAL_DESTROY, new ext4_handle_error() calls will not
> - * queue s_sb_upd_work and ext4_force_commit() makes sure any
> - * ext4_handle_error() calls from the running transaction commit are
> - * finished. Hence no new s_sb_upd_work can be queued after we
> - * flush it here.
> - */
> ext4_set_mount_flag(sbi->s_sb, EXT4_MF_JOURNAL_DESTROY);
> -
> ext4_force_commit(sbi->s_sb);
> flush_work(&sbi->s_sb_upd_work);
> +}
> +
> +/*
> + * Destroy the journal. Must be called after ext4_journal_stop_work().
> + * Pass journal explicitly as it may not be cached in the sbi->s_journal
> + * in some cases.
> + */
> +static inline int ext4_journal_finish(struct ext4_sb_info *sbi,
> + journal_t *journal)
> +{
> + int err;
>
> err = jbd2_journal_destroy(journal);
> sbi->s_journal = NULL;
> -
> return err;
> }
>
> +static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *journal)
> +{
> + ext4_journal_stop_work(sbi);
> + return ext4_journal_finish(sbi, journal);
> +}
> +
> #endif /* _EXT4_JBD2_H */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 752f414aa06b..9bba783f44e1 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1280,16 +1280,12 @@ static void ext4_put_super(struct super_block *sb)
> int err;
>
> /*
> - * Unregister sysfs before destroying jbd2 journal.
> - * Since we could still access attr_journal_task attribute via sysfs
> - * path which could have sbi->s_journal->j_task as NULL
> - * Unregister sysfs before flush sbi->s_sb_upd_work.
> - * Since user may read /proc/fs/ext4/xx/mb_groups during umount, If
> - * read metadata verify failed then will queue error work.
> - * update_super_work will call start_this_handle may trigger
> - * BUG_ON.
> + * Remove procfs entries before flush s_sb_upd_work. Since user may
> + * read /proc/fs/ext4/xx/mb_groups during umount, if read metadata
> + * verify failed then will queue error work. update_super_work will
> + * call start_this_handle which may trigger BUG_ON.
> */
> - ext4_unregister_sysfs(sb);
> + ext4_sb_release_proc(sb);
>
> if (___ratelimit(&ext4_mount_msg_ratelimit, "EXT4-fs unmount"))
> ext4_msg(sb, KERN_INFO, "unmounting filesystem %pU.",
> @@ -1301,14 +1297,30 @@ static void ext4_put_super(struct super_block *sb)
> destroy_workqueue(sbi->rsv_conversion_wq);
> ext4_release_orphan_info(sb);
>
> + /*
> + * Flush s_sb_upd_work before unregistering sysfs, since
> + * update_super_work calls ext4_notify_error_sysfs() which accesses
> + * the kobject's kernfs_node via sysfs_notify(). Unregistering sysfs
> + * before the flush could lead to a use-after-free on the
> + * kernfs_node.
> + *
> + * Also unregister sysfs before destroying jbd2 journal, since
> + * userspace could read the journal_task sysfs attribute while
> + * jbd2_journal_destroy() is killing the journal thread, leading to
> + * a NULL pointer dereference of j_task in journal_task_show().
> + */
> if (sbi->s_journal) {
> aborted = is_journal_aborted(sbi->s_journal);
> - err = ext4_journal_destroy(sbi, sbi->s_journal);
> + ext4_journal_stop_work(sbi);
> + ext4_unregister_sysfs(sb);
> + err = ext4_journal_finish(sbi, sbi->s_journal);
> if ((err < 0) && !aborted) {
> ext4_abort(sb, -err, "Couldn't clean up the journal");
> }
> - } else
> + } else {
> flush_work(&sbi->s_sb_upd_work);
> + ext4_unregister_sysfs(sb);
> + }
>
> ext4_es_unregister_shrinker(sbi);
> timer_shutdown_sync(&sbi->s_err_report);
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index d2ecc1026c0c..f6947416c1e7 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -638,6 +638,16 @@ int ext4_register_sysfs(struct super_block *sb)
> return 0;
> }
>
> +void ext4_sb_release_proc(struct super_block *sb)
> +{
> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> +
> + if (sbi->s_proc) {
> + remove_proc_subtree(sb->s_id, ext4_proc_root);
> + sbi->s_proc = NULL;
> + }
> +}
> +
> void ext4_unregister_sysfs(struct super_block *sb)
> {
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> --
> 2.43.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1] ext4: fix use-after-free in update_super_work when racing with umount
2026-03-17 11:38 ` Jan Kara
@ 2026-03-18 5:04 ` Jiayuan Chen
2026-03-18 16:56 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2026-03-18 5:04 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, Theodore Ts'o, Andreas Dilger, Ritesh Harjani,
Ye Bin, linux-kernel
On 3/17/26 7:38 PM, Jan Kara wrote:
> On Fri 13-03-26 14:52:04, Jiayuan Chen wrote:
>> Commit b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
>> moved ext4_unregister_sysfs() before flushing s_sb_upd_work to prevent new
>> error work from being queued via /proc/fs/ext4/xx/mb_groups reads during
>> unmount. However, this introduced a use-after-free because
>> update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which
>> accesses the kobject's kernfs_node after it has been freed:
>>
>> update_super_work ext4_put_super
>> ----------------- --------------
>> ext4_unregister_sysfs(sb)
>> kobject_del(&sbi->s_kobj)
>> __kobject_del()
>> sysfs_remove_dir()
>> kobj->sd = NULL
>> sysfs_put(sd)
>> kernfs_put() // RCU free
>> ext4_notify_error_sysfs(sbi)
>> sysfs_notify(&sbi->s_kobj)
>> kn = kobj->sd // stale pointer
>> kernfs_get(kn) // UAF on freed kernfs_node
>> ext4_journal_destroy()
>> flush_work(&sbi->s_sb_upd_work)
>>
>> The original blamed commit needed procfs removal before the work
>> flush to prevent /proc/fs/ext4/xx/mb_groups reads from queuing new error
>> work. But it bundled procfs removal and kobject_del together in
>> ext4_unregister_sysfs(), causing the sysfs kobject to be torn down too
>> early.
>>
>> The correct teardown ordering has three constraints:
>>
>> 1. procfs removal must happen before flushing s_sb_upd_work, to prevent
>> /proc reads from queuing new error work that would BUG_ON.
>> 2. sysfs kobject removal must happen after flushing s_sb_upd_work, since
>> the work calls sysfs_notify() which accesses the kernfs_node.
>> 3. sysfs kobject removal must happen before jbd2_journal_destroy(), since
>> userspace could read the journal_task sysfs attribute and dereference
>> j_task after the journal thread has been killed.
>>
>> Fix this by:
>> - Adding ext4_sb_release_proc() to remove procfs entries early.
>> - Splitting ext4_journal_destroy() into ext4_journal_stop_work() and
>> ext4_journal_finish(), so that ext4_unregister_sysfs() can be placed
>> between them to satisfy all three ordering constraints.
>>
>> Fixes: b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
>> Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
>> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Thanks for the analysis and the patch! I fully agree with your analysis but
> I think your solution shows that the teardown sequence is just too subtle
> (too many different dependencies). Ideally, we'd instead modify
> ext4_notify_error_sysfs() to detect sysfs is already torn down by
> ext4_unregister_sysfs() and do nothing. We can check
> sbi->s_kobj.state_in_sysfs for that. The only trouble with this is that
> sysfs_notify() could still race with kobject_del() so we also need some
> locking in ext4_unregister_sysfs() locking out parallel
> ext4_notify_error_sysfs() and we probably need to introduce a separate
> mutex for that. What do you think?
>
> Honza
>
Hi Honza,
Thanks for the suggestion !
I tried the approach you described — having ext4_notify_error_sysfs()
check s_kobj.state_in_sysfs and using a
dedicated mutex to serialize against kobject_del() in
ext4_unregister_sysfs(). It is indeed much simpler than my original
approach of reordering the teardown sequence.
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b76966dc06c0..c012e85d2515 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1574,6 +1574,7 @@ struct ext4_sb_info {
struct proc_dir_entry *s_proc;
struct kobject s_kobj;
struct completion s_kobj_unregister;
+ struct mutex s_error_notify_mutex; /* protects sysfs_notify vs
kobject_del */
struct super_block *s_sb;
struct buffer_head *s_mmp_bh;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index d2ecc1026c0c..f2505988e010 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -597,7 +597,10 @@ static const struct kobj_type ext4_feat_ktype = {
void ext4_notify_error_sysfs(struct ext4_sb_info *sbi)
{
- sysfs_notify(&sbi->s_kobj, NULL, "errors_count");
+ mutex_lock(&sbi->s_error_notify_mutex);
+ if (sbi->s_kobj.state_in_sysfs)
+ sysfs_notify(&sbi->s_kobj, NULL, "errors_count");
+ mutex_unlock(&sbi->s_error_notify_mutex);
}
static struct kobject *ext4_root;
@@ -610,6 +613,7 @@ int ext4_register_sysfs(struct super_block *sb)
int err;
init_completion(&sbi->s_kobj_unregister);
+ mutex_init(&sbi->s_error_notify_mutex);
err = kobject_init_and_add(&sbi->s_kobj, &ext4_sb_ktype, ext4_root,
"%s", sb->s_id);
if (err) {
@@ -644,7 +648,10 @@ void ext4_unregister_sysfs(struct super_block *sb)
if (sbi->s_proc)
remove_proc_subtree(sb->s_id, ext4_proc_root);
+
+ mutex_lock(&sbi->s_error_notify_mutex);
kobject_del(&sbi->s_kobj);
+ mutex_unlock(&sbi->s_error_notify_mutex);
}
int __init ext4_init_sysfs(void)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v1] ext4: fix use-after-free in update_super_work when racing with umount
2026-03-18 5:04 ` Jiayuan Chen
@ 2026-03-18 16:56 ` Jan Kara
0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2026-03-18 16:56 UTC (permalink / raw)
To: Jiayuan Chen
Cc: Jan Kara, linux-ext4, Theodore Ts'o, Andreas Dilger,
Ritesh Harjani, Ye Bin, linux-kernel
On Wed 18-03-26 13:04:03, Jiayuan Chen wrote:
>
> On 3/17/26 7:38 PM, Jan Kara wrote:
> > On Fri 13-03-26 14:52:04, Jiayuan Chen wrote:
> > > Commit b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
> > > moved ext4_unregister_sysfs() before flushing s_sb_upd_work to prevent new
> > > error work from being queued via /proc/fs/ext4/xx/mb_groups reads during
> > > unmount. However, this introduced a use-after-free because
> > > update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which
> > > accesses the kobject's kernfs_node after it has been freed:
> > >
> > > update_super_work ext4_put_super
> > > ----------------- --------------
> > > ext4_unregister_sysfs(sb)
> > > kobject_del(&sbi->s_kobj)
> > > __kobject_del()
> > > sysfs_remove_dir()
> > > kobj->sd = NULL
> > > sysfs_put(sd)
> > > kernfs_put() // RCU free
> > > ext4_notify_error_sysfs(sbi)
> > > sysfs_notify(&sbi->s_kobj)
> > > kn = kobj->sd // stale pointer
> > > kernfs_get(kn) // UAF on freed kernfs_node
> > > ext4_journal_destroy()
> > > flush_work(&sbi->s_sb_upd_work)
> > >
> > > The original blamed commit needed procfs removal before the work
> > > flush to prevent /proc/fs/ext4/xx/mb_groups reads from queuing new error
> > > work. But it bundled procfs removal and kobject_del together in
> > > ext4_unregister_sysfs(), causing the sysfs kobject to be torn down too
> > > early.
> > >
> > > The correct teardown ordering has three constraints:
> > >
> > > 1. procfs removal must happen before flushing s_sb_upd_work, to prevent
> > > /proc reads from queuing new error work that would BUG_ON.
> > > 2. sysfs kobject removal must happen after flushing s_sb_upd_work, since
> > > the work calls sysfs_notify() which accesses the kernfs_node.
> > > 3. sysfs kobject removal must happen before jbd2_journal_destroy(), since
> > > userspace could read the journal_task sysfs attribute and dereference
> > > j_task after the journal thread has been killed.
> > >
> > > Fix this by:
> > > - Adding ext4_sb_release_proc() to remove procfs entries early.
> > > - Splitting ext4_journal_destroy() into ext4_journal_stop_work() and
> > > ext4_journal_finish(), so that ext4_unregister_sysfs() can be placed
> > > between them to satisfy all three ordering constraints.
> > >
> > > Fixes: b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem")
> > > Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> > > Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> > Thanks for the analysis and the patch! I fully agree with your analysis but
> > I think your solution shows that the teardown sequence is just too subtle
> > (too many different dependencies). Ideally, we'd instead modify
> > ext4_notify_error_sysfs() to detect sysfs is already torn down by
> > ext4_unregister_sysfs() and do nothing. We can check
> > sbi->s_kobj.state_in_sysfs for that. The only trouble with this is that
> > sysfs_notify() could still race with kobject_del() so we also need some
> > locking in ext4_unregister_sysfs() locking out parallel
> > ext4_notify_error_sysfs() and we probably need to introduce a separate
> > mutex for that. What do you think?
> >
> > Honza
> >
>
>
> Hi Honza,
>
> Thanks for the suggestion !
>
> I tried the approach you described — having ext4_notify_error_sysfs() check
> s_kobj.state_in_sysfs and using a
> dedicated mutex to serialize against kobject_del() in
> ext4_unregister_sysfs(). It is indeed much simpler than my original
> approach of reordering the teardown sequence.
Yes, the patch looks good to me. Please create a proper patch submission
with changelog etc. and I can give you my reviewed-by tag :).
Honza
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b76966dc06c0..c012e85d2515 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1574,6 +1574,7 @@ struct ext4_sb_info {
> struct proc_dir_entry *s_proc;
> struct kobject s_kobj;
> struct completion s_kobj_unregister;
> + struct mutex s_error_notify_mutex; /* protects sysfs_notify vs
> kobject_del */
> struct super_block *s_sb;
> struct buffer_head *s_mmp_bh;
>
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index d2ecc1026c0c..f2505988e010 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> @@ -597,7 +597,10 @@ static const struct kobj_type ext4_feat_ktype = {
>
> void ext4_notify_error_sysfs(struct ext4_sb_info *sbi)
> {
> - sysfs_notify(&sbi->s_kobj, NULL, "errors_count");
> + mutex_lock(&sbi->s_error_notify_mutex);
> + if (sbi->s_kobj.state_in_sysfs)
> + sysfs_notify(&sbi->s_kobj, NULL, "errors_count");
> + mutex_unlock(&sbi->s_error_notify_mutex);
> }
>
> static struct kobject *ext4_root;
> @@ -610,6 +613,7 @@ int ext4_register_sysfs(struct super_block *sb)
> int err;
>
> init_completion(&sbi->s_kobj_unregister);
> + mutex_init(&sbi->s_error_notify_mutex);
> err = kobject_init_and_add(&sbi->s_kobj, &ext4_sb_ktype, ext4_root,
> "%s", sb->s_id);
> if (err) {
> @@ -644,7 +648,10 @@ void ext4_unregister_sysfs(struct super_block *sb)
>
> if (sbi->s_proc)
> remove_proc_subtree(sb->s_id, ext4_proc_root);
> +
> + mutex_lock(&sbi->s_error_notify_mutex);
> kobject_del(&sbi->s_kobj);
> + mutex_unlock(&sbi->s_error_notify_mutex);
> }
>
> int __init ext4_init_sysfs(void)
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-18 16:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13 6:52 [PATCH v1] ext4: fix use-after-free in update_super_work when racing with umount Jiayuan Chen
2026-03-17 11:38 ` Jan Kara
2026-03-18 5:04 ` Jiayuan Chen
2026-03-18 16:56 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox