public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle
@ 2025-03-18  7:52 Ojaswin Mujoo
  2025-03-18  7:52 ` [PATCH v4 1/3] ext4: define ext4_journal_destroy wrapper Ojaswin Mujoo
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Ojaswin Mujoo @ 2025-03-18  7:52 UTC (permalink / raw)
  To: linux-ext4, Theodore Ts'o
  Cc: Jan Kara, Baokun Li, Ritesh Harjani, linux-kernel

** Changes in v4 [4] **

  * some minor refactoring and typo fix

[4] https://lore.kernel.org/linux-ext4/cover.1741938027.git.ojaswin@linux.ibm.com/T/#m8b5191fef8b201246ab5b34f7dc11b79fe6afe99

** Changes in v3 [3] **

 * Picked RVBs from Jan and Baokun
 * In patch 2/3 we now do the following:
   set flag -> force commit -> flush wq
 * use sbi->s_mount_flags instead of an
   sbi field for journal destroy flag

[3] https://lore.kernel.org/linux-ext4/Z86b0c_qTURBBkOW@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com/T/#m3a6c2e9205381d86460267e7be37ae0db688b800

** Changes in v2 [1] **

 * Picked up RVBs from Jan and Ritesh
 * In patch 2/3, we now use a flag in sbi instead of SB_ACITVE
   to determine when to journal sb vs when to commit directly.
 * Added a prep patch 1/3

[1] https://lore.kernel.org/linux-ext4/cover.1740212945.git.ojaswin@linux.ibm.com/T/#m5e659425b8c8fe2ac01e7242b77fed315ff89db4

@Baokun, I didn't get a chance to look into the journal_inode
modifications we were discussing in [2]. I'll try to spend some time and
send that as a separate patch. Hope that's okay

[2] https://lore.kernel.org/linux-ext4/cover.1740212945.git.ojaswin@linux.ibm.com/T/#mad8feb44d9b6ddadf87830b92caa7b78d902dc05
 
** Original Cover **

When running LTP stress tests on ext4, after a multiday run we seemed to
have hit the following BUG_ON:

 [NIP  : start_this_handle+268]
 #3 [c000001067c27a40] start_this_handle at c008000004d40f74 [jbd2]  (unreliable)
 #4 [c000001067c27b60] jbd2__journal_start at c008000004d415cc [jbd2]
 #5 [c000001067c27be0] update_super_work at c0080000053f9758 [ext4]
 #6 [c000001067c27c70] process_one_work at c000000000188790
 #7 [c000001067c27d20] worker_thread at c00000000018973c
 #8 [c000001067c27dc0] kthread at c000000000196c84
 #9 [c000001067c27e10] ret_from_kernel_thread at c00000000000cd64

Which comes out to

  382   repeat:
  383           read_lock(&journal->j_state_lock);
* 384           BUG_ON(journal->j_flags & JBD2_UNMOUNT);
  385           if (is_journal_aborted(journal) ||
  386               (journal->j_errno != 0 && !(journal->j_flags & JBD2_ACK_ERR))) {
  387                   read_unlock(&journal->j_state_lock);


Initially this seemed like it should never happen but upon crash
analysis it seems like it could indeed be hit as described in patch 1/2.

I would like to add that through the logs we only knew that:

- ext4_journal_bmap -> ext4_map_blocks is failing with EFSCORRUPTED.
- update_super_work had hit the BUG_ON

I was not able to hit this bug again (without modifying the kernel to
inject errors) but the above backtrace seems to be one possible paths
where this BUG_ON can be hit. Rest of the analysis and fix is in patch
2/3. Patch 3 is just a small tweak that i found helpful while debugging.

That being said, journalling is something I'm not very familiar with and
there might be gaps in my understanding so thoughts and suggestions are
welcome.


Ojaswin Mujoo (3):
  ext4: define ext4_journal_destroy wrapper
  ext4: avoid journaling sb update on error if journal is destroying
  ext4: Make sb update interval tunable

 fs/ext4/ext4.h      | 12 +++++++++++-
 fs/ext4/ext4_jbd2.h | 29 ++++++++++++++++++++++++++++
 fs/ext4/super.c     | 47 +++++++++++++++++++++------------------------
 fs/ext4/sysfs.c     |  4 ++++
 4 files changed, 66 insertions(+), 26 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/3] ext4: define ext4_journal_destroy wrapper
  2025-03-18  7:52 [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Ojaswin Mujoo
@ 2025-03-18  7:52 ` Ojaswin Mujoo
  2025-03-18  7:52 ` [PATCH v4 2/3] ext4: avoid journaling sb update on error if journal is destroying Ojaswin Mujoo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Ojaswin Mujoo @ 2025-03-18  7:52 UTC (permalink / raw)
  To: linux-ext4, Theodore Ts'o
  Cc: Jan Kara, Baokun Li, Ritesh Harjani, linux-kernel

Define an ext4 wrapper over jbd2_journal_destroy to make sure we
have consistent behavior during journal destruction. This will also
come useful in the next patch where we add some ext4 specific logic
in the destroy path.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/ext4_jbd2.h | 14 ++++++++++++++
 fs/ext4/super.c     | 16 ++++++----------
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 3f2596c9e5f2..9b3c9df02a39 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -429,4 +429,18 @@ static inline int ext4_should_dioread_nolock(struct inode *inode)
 	return 1;
 }
 
+/*
+ * Pass journal explicitly as it may not be cached in the sbi->s_journal in some
+ * cases
+ */
+static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *journal)
+{
+	int err = 0;
+
+	err = jbd2_journal_destroy(journal);
+	sbi->s_journal = NULL;
+
+	return err;
+}
+
 #endif	/* _EXT4_JBD2_H */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a963ffda692a..8ad664d47806 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1297,8 +1297,7 @@ static void ext4_put_super(struct super_block *sb)
 
 	if (sbi->s_journal) {
 		aborted = is_journal_aborted(sbi->s_journal);
-		err = jbd2_journal_destroy(sbi->s_journal);
-		sbi->s_journal = NULL;
+		err = ext4_journal_destroy(sbi, sbi->s_journal);
 		if ((err < 0) && !aborted) {
 			ext4_abort(sb, -err, "Couldn't clean up the journal");
 		}
@@ -4960,8 +4959,7 @@ static int ext4_load_and_init_journal(struct super_block *sb,
 out:
 	/* flush s_sb_upd_work before destroying the journal. */
 	flush_work(&sbi->s_sb_upd_work);
-	jbd2_journal_destroy(sbi->s_journal);
-	sbi->s_journal = NULL;
+	ext4_journal_destroy(sbi, sbi->s_journal);
 	return -EINVAL;
 }
 
@@ -5652,8 +5650,7 @@ failed_mount8: __maybe_unused
 	if (sbi->s_journal) {
 		/* flush s_sb_upd_work before journal destroy. */
 		flush_work(&sbi->s_sb_upd_work);
-		jbd2_journal_destroy(sbi->s_journal);
-		sbi->s_journal = NULL;
+		ext4_journal_destroy(sbi, sbi->s_journal);
 	}
 failed_mount3a:
 	ext4_es_unregister_shrinker(sbi);
@@ -5958,7 +5955,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
 	return journal;
 
 out_journal:
-	jbd2_journal_destroy(journal);
+	ext4_journal_destroy(EXT4_SB(sb), journal);
 out_bdev:
 	bdev_fput(bdev_file);
 	return ERR_PTR(errno);
@@ -6075,8 +6072,7 @@ static int ext4_load_journal(struct super_block *sb,
 	EXT4_SB(sb)->s_journal = journal;
 	err = ext4_clear_journal_err(sb, es);
 	if (err) {
-		EXT4_SB(sb)->s_journal = NULL;
-		jbd2_journal_destroy(journal);
+		ext4_journal_destroy(EXT4_SB(sb), journal);
 		return err;
 	}
 
@@ -6094,7 +6090,7 @@ static int ext4_load_journal(struct super_block *sb,
 	return 0;
 
 err_out:
-	jbd2_journal_destroy(journal);
+	ext4_journal_destroy(EXT4_SB(sb), journal);
 	return err;
 }
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/3] ext4: avoid journaling sb update on error if journal is destroying
  2025-03-18  7:52 [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Ojaswin Mujoo
  2025-03-18  7:52 ` [PATCH v4 1/3] ext4: define ext4_journal_destroy wrapper Ojaswin Mujoo
@ 2025-03-18  7:52 ` Ojaswin Mujoo
  2025-03-18 10:08   ` Jan Kara
  2025-03-18  7:52 ` [PATCH v4 3/3] ext4: Make sb update interval tunable Ojaswin Mujoo
  2025-03-22  3:36 ` [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Theodore Ts'o
  3 siblings, 1 reply; 6+ messages in thread
From: Ojaswin Mujoo @ 2025-03-18  7:52 UTC (permalink / raw)
  To: linux-ext4, Theodore Ts'o
  Cc: Jan Kara, Baokun Li, Ritesh Harjani, linux-kernel, Mahesh Kumar

Presently we always BUG_ON if trying to start a transaction on a journal marked
with JBD2_UNMOUNT, since this should never happen. However, while ltp running
stress tests, it was observed that in case of some error handling paths, it is
possible for update_super_work to start a transaction after the journal is
destroyed eg:

(umount)
ext4_kill_sb
  kill_block_super
    generic_shutdown_super
      sync_filesystem /* commits all txns */
      evict_inodes
        /* might start a new txn */
      ext4_put_super
	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
        jbd2_journal_destroy
          journal_kill_thread
            journal->j_flags |= JBD2_UNMOUNT;
          jbd2_journal_commit_transaction
            jbd2_journal_get_descriptor_buffer
              jbd2_journal_bmap
                ext4_journal_bmap
                  ext4_map_blocks
                    ...
                    ext4_inode_error
                      ext4_handle_error
                        schedule_work(&sbi->s_sb_upd_work)

                                               /* work queue kicks in */
                                               update_super_work
                                                 jbd2_journal_start
                                                   start_this_handle
                                                     BUG_ON(journal->j_flags &
                                                            JBD2_UNMOUNT)

Hence, introduce a new mount flag to indicate journal is destroying and only do
a journaled (and deferred) update of sb if this flag is not set. Otherwise, just
fallback to an un-journaled commit.

Further, in the journal destroy path, we have the following sequence:

  1. Set mount flag indicating journal is destroying
  2. force a commit and wait for it
  3. flush pending sb updates

This sequence is important as it ensures that, after this point, there is no sb
update that might be journaled so it is safe to update the sb outside the
journal. (To avoid race discussed in 2d01ddc86606)

Also, we don't need a similar check in ext4_grp_locked_error since it is only
called from mballoc and AFAICT it would be always valid to schedule work here.

Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
Reported-by: Mahesh Kumar <maheshkumar657g@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/ext4.h      |  3 ++-
 fs/ext4/ext4_jbd2.h | 15 +++++++++++++++
 fs/ext4/super.c     | 16 ++++++++--------
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2b7d781bfcad..0685bb68e64a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1823,7 +1823,8 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
  */
 enum {
 	EXT4_MF_MNTDIR_SAMPLED,
-	EXT4_MF_FC_INELIGIBLE	/* Fast commit ineligible */
+	EXT4_MF_FC_INELIGIBLE,	/* Fast commit ineligible */
+	EXT4_MF_JOURNAL_DESTROY	/* Journal is in process of destroying */
 };
 
 static inline void ext4_set_mount_flag(struct super_block *sb, int bit)
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 9b3c9df02a39..3221714d9901 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -437,6 +437,21 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour
 {
 	int err = 0;
 
+	/*
+	 * At this point only two things can be operating on the journal.
+	 * JBD2 thread performing transaction commit and s_sb_upd_work
+	 * issuing sb update through the journal. Once we set
+	 * EXT4_JOURNAL_DESTROY, new ext4_handle_error() calls will not
+	 * queue s_sb_upd_work and ext4_force_commit() makes sure any
+	 * ext4_handle_error() calls from the running transaction commit are
+	 * finished. Hence no new s_sb_upd_work can be queued after we
+	 * flush it here.
+	 */
+	ext4_set_mount_flag(sbi->s_sb, EXT4_MF_JOURNAL_DESTROY);
+
+	ext4_force_commit(sbi->s_sb);
+	flush_work(&sbi->s_sb_upd_work);
+
 	err = jbd2_journal_destroy(journal);
 	sbi->s_journal = NULL;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8ad664d47806..46f7c9922cda 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -704,9 +704,13 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
 		 * In case the fs should keep running, we need to writeout
 		 * superblock through the journal. Due to lock ordering
 		 * constraints, it may not be safe to do it right here so we
-		 * defer superblock flushing to a workqueue.
+		 * defer superblock flushing to a workqueue. We just need to be
+		 * careful when the journal is already shutting down. If we get
+		 * here in that case, just update the sb directly as the last
+		 * transaction won't commit anyway.
 		 */
-		if (continue_fs && journal)
+		if (continue_fs && journal &&
+		    !ext4_test_mount_flag(sb, EXT4_MF_JOURNAL_DESTROY))
 			schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
 		else
 			ext4_commit_super(sb);
@@ -1291,7 +1295,6 @@ static void ext4_put_super(struct super_block *sb)
 	ext4_unregister_li_request(sb);
 	ext4_quotas_off(sb, EXT4_MAXQUOTAS);
 
-	flush_work(&sbi->s_sb_upd_work);
 	destroy_workqueue(sbi->rsv_conversion_wq);
 	ext4_release_orphan_info(sb);
 
@@ -1301,7 +1304,8 @@ static void ext4_put_super(struct super_block *sb)
 		if ((err < 0) && !aborted) {
 			ext4_abort(sb, -err, "Couldn't clean up the journal");
 		}
-	}
+	} else
+		flush_work(&sbi->s_sb_upd_work);
 
 	ext4_es_unregister_shrinker(sbi);
 	timer_shutdown_sync(&sbi->s_err_report);
@@ -4957,8 +4961,6 @@ static int ext4_load_and_init_journal(struct super_block *sb,
 	return 0;
 
 out:
-	/* flush s_sb_upd_work before destroying the journal. */
-	flush_work(&sbi->s_sb_upd_work);
 	ext4_journal_destroy(sbi, sbi->s_journal);
 	return -EINVAL;
 }
@@ -5648,8 +5650,6 @@ failed_mount8: __maybe_unused
 	sbi->s_ea_block_cache = NULL;
 
 	if (sbi->s_journal) {
-		/* flush s_sb_upd_work before journal destroy. */
-		flush_work(&sbi->s_sb_upd_work);
 		ext4_journal_destroy(sbi, sbi->s_journal);
 	}
 failed_mount3a:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 3/3] ext4: Make sb update interval tunable
  2025-03-18  7:52 [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Ojaswin Mujoo
  2025-03-18  7:52 ` [PATCH v4 1/3] ext4: define ext4_journal_destroy wrapper Ojaswin Mujoo
  2025-03-18  7:52 ` [PATCH v4 2/3] ext4: avoid journaling sb update on error if journal is destroying Ojaswin Mujoo
@ 2025-03-18  7:52 ` Ojaswin Mujoo
  2025-03-22  3:36 ` [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Theodore Ts'o
  3 siblings, 0 replies; 6+ messages in thread
From: Ojaswin Mujoo @ 2025-03-18  7:52 UTC (permalink / raw)
  To: linux-ext4, Theodore Ts'o
  Cc: Jan Kara, Baokun Li, Ritesh Harjani, linux-kernel

Currently, outside error paths, we auto commit the super block after 1
hour has passed and 16MB worth of updates have been written since last
commit. This is a policy decision so make this tunable while keeping the
defaults same. This is useful if user wants to tweak the superblock
behavior or for debugging the codepath by allowing to trigger it more
frequently.

We can now tweak the super block update using sb_update_sec and
sb_update_kb files in /sys/fs/ext4/<dev>/

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/ext4.h  |  9 +++++++++
 fs/ext4/super.c | 15 ++++++++-------
 fs/ext4/sysfs.c |  4 ++++
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0685bb68e64a..5259825396c4 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1608,6 +1608,8 @@ struct ext4_sb_info {
 	unsigned int s_mb_prefetch;
 	unsigned int s_mb_prefetch_limit;
 	unsigned int s_mb_best_avail_max_trim_order;
+	unsigned int s_sb_update_sec;
+	unsigned int s_sb_update_kb;
 
 	/* stats for buddy allocator */
 	atomic_t s_bal_reqs;	/* number of reqs with len > 1 */
@@ -2280,6 +2282,13 @@ static inline int ext4_forced_shutdown(struct super_block *sb)
 #define EXT4_DEF_MIN_BATCH_TIME	0
 #define EXT4_DEF_MAX_BATCH_TIME	15000 /* 15ms */
 
+/*
+ * Default values for superblock update
+ */
+#define EXT4_DEF_SB_UPDATE_INTERVAL_SEC (3600) /* seconds (1 hour) */
+#define EXT4_DEF_SB_UPDATE_INTERVAL_KB (16384) /* kilobytes (16MB) */
+
+
 /*
  * Minimum number of groups in a flexgroup before we separate out
  * directories into the first block group of a flexgroup
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 46f7c9922cda..9eb1ae2e84bb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -447,9 +447,6 @@ static time64_t __ext4_get_tstamp(__le32 *lo, __u8 *hi)
 #define ext4_get_tstamp(es, tstamp) \
 	__ext4_get_tstamp(&(es)->tstamp, &(es)->tstamp ## _hi)
 
-#define EXT4_SB_REFRESH_INTERVAL_SEC (3600) /* seconds (1 hour) */
-#define EXT4_SB_REFRESH_INTERVAL_KB (16384) /* kilobytes (16MB) */
-
 /*
  * The ext4_maybe_update_superblock() function checks and updates the
  * superblock if needed.
@@ -457,8 +454,10 @@ static time64_t __ext4_get_tstamp(__le32 *lo, __u8 *hi)
  * This function is designed to update the on-disk superblock only under
  * certain conditions to prevent excessive disk writes and unnecessary
  * waking of the disk from sleep. The superblock will be updated if:
- * 1. More than an hour has passed since the last superblock update, and
- * 2. More than 16MB have been written since the last superblock update.
+ * 1. More than sbi->s_sb_update_sec (def: 1 hour) has passed since the last
+ *    superblock update
+ * 2. More than sbi->s_sb_update_kb (def: 16MB) kbs have been written since the
+ *    last superblock update.
  *
  * @sb: The superblock
  */
@@ -479,7 +478,7 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
 	now = ktime_get_real_seconds();
 	last_update = ext4_get_tstamp(es, s_wtime);
 
-	if (likely(now - last_update < EXT4_SB_REFRESH_INTERVAL_SEC))
+	if (likely(now - last_update < sbi->s_sb_update_sec))
 		return;
 
 	lifetime_write_kbytes = sbi->s_kbytes_written +
@@ -494,7 +493,7 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
 	 */
 	diff_size = lifetime_write_kbytes - le64_to_cpu(es->s_kbytes_written);
 
-	if (diff_size > EXT4_SB_REFRESH_INTERVAL_KB)
+	if (diff_size > sbi->s_sb_update_kb)
 		schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
 }
 
@@ -5248,6 +5247,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	sbi->s_commit_interval = JBD2_DEFAULT_MAX_COMMIT_AGE * HZ;
 	sbi->s_min_batch_time = EXT4_DEF_MIN_BATCH_TIME;
 	sbi->s_max_batch_time = EXT4_DEF_MAX_BATCH_TIME;
+	sbi->s_sb_update_kb = EXT4_DEF_SB_UPDATE_INTERVAL_KB;
+	sbi->s_sb_update_sec = EXT4_DEF_SB_UPDATE_INTERVAL_SEC;
 
 	/*
 	 * set default s_li_wait_mult for lazyinit, for the case there is
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index ddb54608ca2e..987bd00f916a 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -254,6 +254,8 @@ EXT4_ATTR(journal_task, 0444, journal_task);
 EXT4_RW_ATTR_SBI_UI(mb_prefetch, s_mb_prefetch);
 EXT4_RW_ATTR_SBI_UI(mb_prefetch_limit, s_mb_prefetch_limit);
 EXT4_RW_ATTR_SBI_UL(last_trim_minblks, s_last_trim_minblks);
+EXT4_RW_ATTR_SBI_UI(sb_update_sec, s_sb_update_sec);
+EXT4_RW_ATTR_SBI_UI(sb_update_kb, s_sb_update_kb);
 
 static unsigned int old_bump_val = 128;
 EXT4_ATTR_PTR(max_writeback_mb_bump, 0444, pointer_ui, &old_bump_val);
@@ -305,6 +307,8 @@ static struct attribute *ext4_attrs[] = {
 	ATTR_LIST(mb_prefetch),
 	ATTR_LIST(mb_prefetch_limit),
 	ATTR_LIST(last_trim_minblks),
+	ATTR_LIST(sb_update_sec),
+	ATTR_LIST(sb_update_kb),
 	NULL,
 };
 ATTRIBUTE_GROUPS(ext4);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/3] ext4: avoid journaling sb update on error if journal is destroying
  2025-03-18  7:52 ` [PATCH v4 2/3] ext4: avoid journaling sb update on error if journal is destroying Ojaswin Mujoo
@ 2025-03-18 10:08   ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2025-03-18 10:08 UTC (permalink / raw)
  To: Ojaswin Mujoo
  Cc: linux-ext4, Theodore Ts'o, Jan Kara, Baokun Li,
	Ritesh Harjani, linux-kernel, Mahesh Kumar

On Tue 18-03-25 13:22:56, Ojaswin Mujoo wrote:
> Presently we always BUG_ON if trying to start a transaction on a journal marked
> with JBD2_UNMOUNT, since this should never happen. However, while ltp running
> stress tests, it was observed that in case of some error handling paths, it is
> possible for update_super_work to start a transaction after the journal is
> destroyed eg:
> 
> (umount)
> ext4_kill_sb
>   kill_block_super
>     generic_shutdown_super
>       sync_filesystem /* commits all txns */
>       evict_inodes
>         /* might start a new txn */
>       ext4_put_super
> 	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
>         jbd2_journal_destroy
>           journal_kill_thread
>             journal->j_flags |= JBD2_UNMOUNT;
>           jbd2_journal_commit_transaction
>             jbd2_journal_get_descriptor_buffer
>               jbd2_journal_bmap
>                 ext4_journal_bmap
>                   ext4_map_blocks
>                     ...
>                     ext4_inode_error
>                       ext4_handle_error
>                         schedule_work(&sbi->s_sb_upd_work)
> 
>                                                /* work queue kicks in */
>                                                update_super_work
>                                                  jbd2_journal_start
>                                                    start_this_handle
>                                                      BUG_ON(journal->j_flags &
>                                                             JBD2_UNMOUNT)
> 
> Hence, introduce a new mount flag to indicate journal is destroying and only do
> a journaled (and deferred) update of sb if this flag is not set. Otherwise, just
> fallback to an un-journaled commit.
> 
> Further, in the journal destroy path, we have the following sequence:
> 
>   1. Set mount flag indicating journal is destroying
>   2. force a commit and wait for it
>   3. flush pending sb updates
> 
> This sequence is important as it ensures that, after this point, there is no sb
> update that might be journaled so it is safe to update the sb outside the
> journal. (To avoid race discussed in 2d01ddc86606)
> 
> Also, we don't need a similar check in ext4_grp_locked_error since it is only
> called from mballoc and AFAICT it would be always valid to schedule work here.
> 
> Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
> Reported-by: Mahesh Kumar <maheshkumar657g@gmail.com>
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

									Honza

> ---
>  fs/ext4/ext4.h      |  3 ++-
>  fs/ext4/ext4_jbd2.h | 15 +++++++++++++++
>  fs/ext4/super.c     | 16 ++++++++--------
>  3 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 2b7d781bfcad..0685bb68e64a 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1823,7 +1823,8 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
>   */
>  enum {
>  	EXT4_MF_MNTDIR_SAMPLED,
> -	EXT4_MF_FC_INELIGIBLE	/* Fast commit ineligible */
> +	EXT4_MF_FC_INELIGIBLE,	/* Fast commit ineligible */
> +	EXT4_MF_JOURNAL_DESTROY	/* Journal is in process of destroying */
>  };
>  
>  static inline void ext4_set_mount_flag(struct super_block *sb, int bit)
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 9b3c9df02a39..3221714d9901 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -437,6 +437,21 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour
>  {
>  	int err = 0;
>  
> +	/*
> +	 * At this point only two things can be operating on the journal.
> +	 * JBD2 thread performing transaction commit and s_sb_upd_work
> +	 * issuing sb update through the journal. Once we set
> +	 * EXT4_JOURNAL_DESTROY, new ext4_handle_error() calls will not
> +	 * queue s_sb_upd_work and ext4_force_commit() makes sure any
> +	 * ext4_handle_error() calls from the running transaction commit are
> +	 * finished. Hence no new s_sb_upd_work can be queued after we
> +	 * flush it here.
> +	 */
> +	ext4_set_mount_flag(sbi->s_sb, EXT4_MF_JOURNAL_DESTROY);
> +
> +	ext4_force_commit(sbi->s_sb);
> +	flush_work(&sbi->s_sb_upd_work);
> +
>  	err = jbd2_journal_destroy(journal);
>  	sbi->s_journal = NULL;
>  
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 8ad664d47806..46f7c9922cda 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -704,9 +704,13 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
>  		 * In case the fs should keep running, we need to writeout
>  		 * superblock through the journal. Due to lock ordering
>  		 * constraints, it may not be safe to do it right here so we
> -		 * defer superblock flushing to a workqueue.
> +		 * defer superblock flushing to a workqueue. We just need to be
> +		 * careful when the journal is already shutting down. If we get
> +		 * here in that case, just update the sb directly as the last
> +		 * transaction won't commit anyway.
>  		 */
> -		if (continue_fs && journal)
> +		if (continue_fs && journal &&
> +		    !ext4_test_mount_flag(sb, EXT4_MF_JOURNAL_DESTROY))
>  			schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
>  		else
>  			ext4_commit_super(sb);
> @@ -1291,7 +1295,6 @@ static void ext4_put_super(struct super_block *sb)
>  	ext4_unregister_li_request(sb);
>  	ext4_quotas_off(sb, EXT4_MAXQUOTAS);
>  
> -	flush_work(&sbi->s_sb_upd_work);
>  	destroy_workqueue(sbi->rsv_conversion_wq);
>  	ext4_release_orphan_info(sb);
>  
> @@ -1301,7 +1304,8 @@ static void ext4_put_super(struct super_block *sb)
>  		if ((err < 0) && !aborted) {
>  			ext4_abort(sb, -err, "Couldn't clean up the journal");
>  		}
> -	}
> +	} else
> +		flush_work(&sbi->s_sb_upd_work);
>  
>  	ext4_es_unregister_shrinker(sbi);
>  	timer_shutdown_sync(&sbi->s_err_report);
> @@ -4957,8 +4961,6 @@ static int ext4_load_and_init_journal(struct super_block *sb,
>  	return 0;
>  
>  out:
> -	/* flush s_sb_upd_work before destroying the journal. */
> -	flush_work(&sbi->s_sb_upd_work);
>  	ext4_journal_destroy(sbi, sbi->s_journal);
>  	return -EINVAL;
>  }
> @@ -5648,8 +5650,6 @@ failed_mount8: __maybe_unused
>  	sbi->s_ea_block_cache = NULL;
>  
>  	if (sbi->s_journal) {
> -		/* flush s_sb_upd_work before journal destroy. */
> -		flush_work(&sbi->s_sb_upd_work);
>  		ext4_journal_destroy(sbi, sbi->s_journal);
>  	}
>  failed_mount3a:
> -- 
> 2.48.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle
  2025-03-18  7:52 [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Ojaswin Mujoo
                   ` (2 preceding siblings ...)
  2025-03-18  7:52 ` [PATCH v4 3/3] ext4: Make sb update interval tunable Ojaswin Mujoo
@ 2025-03-22  3:36 ` Theodore Ts'o
  3 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2025-03-22  3:36 UTC (permalink / raw)
  To: linux-ext4, Ojaswin Mujoo
  Cc: Theodore Ts'o, Jan Kara, Baokun Li, Ritesh Harjani,
	linux-kernel


On Tue, 18 Mar 2025 13:22:54 +0530, Ojaswin Mujoo wrote:
> ** Changes in v4 [4] **
> 
>   * some minor refactoring and typo fix
> 
> [4] https://lore.kernel.org/linux-ext4/cover.1741938027.git.ojaswin@linux.ibm.com/T/#m8b5191fef8b201246ab5b34f7dc11b79fe6afe99
> 
> ** Changes in v3 [3] **
> 
> [...]

Applied, thanks!

[1/3] ext4: define ext4_journal_destroy wrapper
      commit: 5a02a6204ca37e7c22fbb55a789c503f05e8e89a
[2/3] ext4: avoid journaling sb update on error if journal is destroying
      commit: ce2f26e73783b4a7c46a86e3af5b5c8de0971790
[3/3] ext4: Make sb update interval tunable
      commit: 896b02d0b9e7deb4a4eb365e13dd912b49916519

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-22  3:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-18  7:52 [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Ojaswin Mujoo
2025-03-18  7:52 ` [PATCH v4 1/3] ext4: define ext4_journal_destroy wrapper Ojaswin Mujoo
2025-03-18  7:52 ` [PATCH v4 2/3] ext4: avoid journaling sb update on error if journal is destroying Ojaswin Mujoo
2025-03-18 10:08   ` Jan Kara
2025-03-18  7:52 ` [PATCH v4 3/3] ext4: Make sb update interval tunable Ojaswin Mujoo
2025-03-22  3:36 ` [PATCH v4 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox