All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Jan Kara <jack@suse.cz>, Baokun Li <libaokun1@huawei.com>,
	linux-kernel@vger.kernel.org,
	Mahesh Kumar <maheshkumar657g@gmail.com>
Subject: Re: [PATCH v2 2/3] ext4: avoid journaling sb update on error if journal is destroying
Date: Sun, 09 Mar 2025 00:11:22 +0530	[thread overview]
Message-ID: <87cyergyb1.fsf@gmail.com> (raw)
In-Reply-To: <Z8xbLrdN3L1E50-G@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>

Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:

> On Sat, Mar 08, 2025 at 06:56:23PM +0530, Ritesh Harjani wrote:
>> Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
>> 
>> > On Sat, Mar 08, 2025 at 03:25:04PM +0530, Ritesh Harjani (IBM) wrote:
>> >> Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
>> >> 
>> >> > Presently we always BUG_ON if trying to start a transaction on a journal marked
>> >> > with JBD2_UNMOUNT, since this should never happen. However, while ltp running
>> >> > stress tests, it was observed that in case of some error handling paths, it is
>> >> > possible for update_super_work to start a transaction after the journal is
>> >> > destroyed eg:
>> >> >
>> >> > (umount)
>> >> > ext4_kill_sb
>> >> >   kill_block_super
>> >> >     generic_shutdown_super
>> >> >       sync_filesystem /* commits all txns */
>> >> >       evict_inodes
>> >> >         /* might start a new txn */
>> >> >       ext4_put_super
>> >> > 	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
>> >> >         jbd2_journal_destroy
>> >> >           journal_kill_thread
>> >> >             journal->j_flags |= JBD2_UNMOUNT;
>> >> >           jbd2_journal_commit_transaction
>> >> >             jbd2_journal_get_descriptor_buffer
>> >> >               jbd2_journal_bmap
>> >> >                 ext4_journal_bmap
>> >> >                   ext4_map_blocks
>> >> >                     ...
>> >> >                     ext4_inode_error
>> >> >                       ext4_handle_error
>> >> >                         schedule_work(&sbi->s_sb_upd_work)
>> >> >
>> >> >                                                /* work queue kicks in */
>> >> >                                                update_super_work
>> >> >                                                  jbd2_journal_start
>> >> >                                                    start_this_handle
>> >> >                                                      BUG_ON(journal->j_flags &
>> >> >                                                             JBD2_UNMOUNT)
>> >> >
>> >> > Hence, introduce a new sbi flag s_journal_destroying to indicate journal is
>> >> > destroying only do a journaled (and deferred) update of sb if this flag is not
>> >> > set. Otherwise, just fallback to an un-journaled commit.
>> >> >
>> >> > We set sbi->s_journal_destroying = true only after all the FS updates are done
>> >> > during ext4_put_super() (except a running transaction that will get commited
>> >> > during jbd2_journal_destroy()). After this point, it is safe to commit the sb
>> >> > outside the journal as it won't race with a journaled update (refer
>> >> > 2d01ddc86606).
>> >> >
>> >> > Also, we don't need a similar check in ext4_grp_locked_error since it is only
>> >> > called from mballoc and AFAICT it would be always valid to schedule work here.
>> >> >
>> >> > Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
>> >> > Reported-by: Mahesh Kumar <maheshkumar657g@gmail.com>
>> >> > Suggested-by: Jan Kara <jack@suse.cz>
>> >> > Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>> >> > ---
>> >> >  fs/ext4/ext4.h      | 2 ++
>> >> >  fs/ext4/ext4_jbd2.h | 8 ++++++++
>> >> >  fs/ext4/super.c     | 4 +++-
>> >> >  3 files changed, 13 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> >> > index 2b7d781bfcad..d48e93bd5690 100644
>> >> > --- a/fs/ext4/ext4.h
>> >> > +++ b/fs/ext4/ext4.h
>> >> > @@ -1728,6 +1728,8 @@ struct ext4_sb_info {
>> >> >  	 */
>> >> >  	struct work_struct s_sb_upd_work;
>> >> >  
>> >> > +	bool s_journal_destorying;
>> >> > +
>> >> >  	/* Atomic write unit values in bytes */
>> >> >  	unsigned int s_awu_min;
>> >> >  	unsigned int s_awu_max;
>> >> > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
>> >> > index 9b3c9df02a39..6bd3ca84410d 100644
>> >> > --- a/fs/ext4/ext4_jbd2.h
>> >> > +++ b/fs/ext4/ext4_jbd2.h
>> >> > @@ -437,6 +437,14 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour
>> >> >  {
>> >> >  	int err = 0;
>> >> >  
>> >> > +	/*
>> >> > +	 * At this point all pending FS updates should be done except a possible
>> >> > +	 * running transaction (which will commit in jbd2_journal_destroy). It
>> >> > +	 * is now safe for any new errors to directly commit superblock rather
>> >> > +	 * than going via journal.
>> >> > +	 */
>> >> > +	sbi->s_journal_destorying = true;
>> >> 
>> >> This is not correct right. I think what we decided to set this flag
>> >> before we flush the workqueue. So that we don't schedule any new
>> >> work after this flag has been set. At least that is what I understood.
>> >> 
>> >> [1]: https://lore.kernel.org/all/87eczc6rlt.fsf@gmail.com/
>> >> 
>> >> -ritesh
>> >
>> > Hey Ritesh,
>> >
>> > Yes that is not correct, I missed that in my patch however we realised
>> > that adding it before flush_work() also has issues [1]. More
>> > specifically:
>> 
>> Ohk. right. 
>> 
>> >
>> >                      **kjournald2**
>> >                      jbd2_journal_commit_transaction()
>> >                      ...
>> >                      ext4_handle_error()
>> >                         /* s_journal_destorying is not set */
>> >                         if (journal && !s_journal_destorying)
>> 
>> Then maybe we should not schedule another work to update the superblock
>> via journalling, it the error itself occurred while were trying to
>> commit the journal txn? 
>> 
>> 
>> -ritesh
>
> Hmm, ideally yes that should not happen, but how can we achieve that?
> For example with the trace we saw:
>
>    **kjournald2**
>    jbd2_journal_commit_transaction()
>      jbd2_journal_get_descriptor_buffer
>        jbd2_journal_bmap
>          ext4_journal_bmap
>            ext4_map_blocks
>              ...
>              ext4_inode_error
>                ext4_handle_error
>                  schedule_work(&sbi->s_sb_upd_work)
>
> How do we tell ext4_handle_error that it is in the context of a
> committing txn.

So even if we identify that the current
jbd2_journal_commit_transaction() is coming from kjournald2(), that is
sufficient right? Because the only other place where we call
jbd2_journal_commit_transaction() is jbd2_journal_destroy() and that
happens after we can set few things from ext4_put_super() and flush work
is completed, correct? 


> We can't pass down an argument all the way down 
> cause that is not feasible. An sb level flag will also not work
> I think. Any thoughts on this?

I was thinking if we should have a per task flag? Something like
PF_KJOURNALD?  (Similar to how we have PF_KSWAPD or PF_KCOMPACTD)? This
can help us identify if we are a kjournald2() kthread.

That will help prevent scheduling another work item to start a new
transaction in case an error occurs while committing the currently
running transaction. Correct?

Now I don't know if we have any free bit available in current->flags. If
not shall we use current->journal_info pointer to have 0th bit as a
flag? Basically override current->journal_info to also store a flag.  We
can create a wrapper to get the journal_info from current by masking
this flag bit and use it to dereference journal_info.

But before going down that road, it's better to know what others think?

-ritesh


>
> regards,
> ojaswin
>
>> 
>> 
>> >   ext4_put_super()
>> >     sbi->s_journal_destorying = true;
>> >     flush_work(&sbi->s_sb_upd_work)
>> >                                       schedule_work()
>> >     jbd2_journal_destroy()
>> >      journal->j_flags |= JBD2_UNMOUNT;
>> >
>> >                                         jbd2_journal_start()
>> >                                          start_this_handle()
>> >                                            BUG_ON(JBD2_UNMOUNT)
>> >
>> > So the right thing to do seems to be that we need to force a journal
>> > commit before the final flush as well. [1] Has more info on this and
>> > some followup discussion as well.
>> >
>> > [1] https://lore.kernel.org/all/cover.1741270780.git.ojaswin@linux.ibm.com/T/#mc8046d47b357665bdbd2878c91e51eb660f94b3e
>> >
>> > Regards,
>> > ojaswin
>> >> 
>> >> 
>> >> > +
>> >> >  	err = jbd2_journal_destroy(journal);
>> >> >  	sbi->s_journal = NULL;
>> >> >  
>> >> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> >> > index 8ad664d47806..31552cf0519a 100644
>> >> > --- a/fs/ext4/super.c
>> >> > +++ b/fs/ext4/super.c
>> >> > @@ -706,7 +706,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
>> >> >  		 * constraints, it may not be safe to do it right here so we
>> >> >  		 * defer superblock flushing to a workqueue.
>> >> >  		 */
>> >> > -		if (continue_fs && journal)
>> >> > +		if (continue_fs && journal && !EXT4_SB(sb)->s_journal_destorying)
>> >> >  			schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
>> >> >  		else
>> >> >  			ext4_commit_super(sb);
>> >> > @@ -5311,6 +5311,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>> >> >  	spin_lock_init(&sbi->s_error_lock);
>> >> >  	INIT_WORK(&sbi->s_sb_upd_work, update_super_work);
>> >> >  
>> >> > +	sbi->s_journal_destorying = false;
>> >> > +
>> >> >  	err = ext4_group_desc_init(sb, es, logical_sb_block, &first_not_zeroed);
>> >> >  	if (err)
>> >> >  		goto failed_mount3;
>> >> > -- 
>> >> > 2.48.1

  reply	other threads:[~2025-03-08 18:57 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 14:28 [PATCH v2 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Ojaswin Mujoo
2025-03-06 14:28 ` [PATCH v2 1/3] ext4: define ext4_journal_destroy wrapper Ojaswin Mujoo
2025-03-07 14:05   ` Jan Kara
2025-03-08  1:18   ` Baokun Li
2025-03-06 14:28 ` [PATCH v2 2/3] ext4: avoid journaling sb update on error if journal is destroying Ojaswin Mujoo
2025-03-07  2:49   ` Zhang Yi
2025-03-07  6:34     ` Ojaswin Mujoo
2025-03-07  8:13       ` Ojaswin Mujoo
2025-03-07  8:43         ` Zhang Yi
2025-03-07 10:27           ` Ojaswin Mujoo
2025-03-07 12:36             ` Zhang Yi
2025-03-07 17:26               ` Ojaswin Mujoo
2025-03-08  2:57                 ` Zhang Yi
2025-03-08  8:18                   ` Ojaswin Mujoo
2025-03-08  9:58                     ` Ojaswin Mujoo
2025-03-08 10:10                       ` Baokun Li
2025-03-08 12:57                         ` Ojaswin Mujoo
2025-03-08 10:01                   ` Ritesh Harjani
2025-03-07 14:26   ` Jan Kara
2025-03-07 17:00     ` Ojaswin Mujoo
2025-03-08  9:55   ` Ritesh Harjani (IBM)
2025-03-08 13:05     ` Ojaswin Mujoo
2025-03-08 13:26       ` Ritesh Harjani
2025-03-08 14:58         ` Ojaswin Mujoo
2025-03-08 18:41           ` Ritesh Harjani [this message]
2025-03-09 12:07             ` Ojaswin Mujoo
2025-03-10  4:43               ` Ritesh Harjani
2025-03-12 10:51                 ` Jan Kara
2025-03-12 14:26                   ` Ojaswin Mujoo
2025-03-12 17:15                     ` Jan Kara
2025-03-13  1:20                       ` Zhang Yi
2025-03-13  2:08                         ` Baokun Li
2025-03-12 13:57   ` Eric Sandeen
2025-03-12 14:27     ` Ojaswin Mujoo
2025-03-06 14:28 ` [PATCH v2 3/3] ext4: Make sb update interval tunable Ojaswin Mujoo
2025-03-08  2:25   ` Baokun Li
2025-03-08  1:09 ` [PATCH v2 0/3] Fix a BUG_ON crashing the kernel in start_this_handle Baokun Li
2025-03-08 13:07   ` Ojaswin Mujoo
2025-03-08 15:21     ` Ojaswin Mujoo
2025-03-10  3:10       ` Baokun Li
2025-03-10  7:59         ` Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cyergyb1.fsf@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=jack@suse.cz \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maheshkumar657g@gmail.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.