From: "yebin (H)" <yebin10@huawei.com>
To: liubaolin <liubaolin12138@163.com>, <tytso@mit.edu>, <jack@suse.com>
Cc: <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<wangguanyu@vivo.com>, Baolin Liu <liubaolin@kylinos.cn>
Subject: Re: [PATCH v1] jbd2: check transaction state before stopping handle
Date: Fri, 6 Mar 2026 09:20:53 +0800 [thread overview]
Message-ID: <69AA2BF5.3000403@huawei.com> (raw)
In-Reply-To: <e52f326a-8da2-457f-8aee-5729373b9582@163.com>
On 2026/3/5 21:11, liubaolin wrote:
>> Dear maintainers and community,
>>
>> Our customer reported a kernel crash issue. The kernel crashes in
>> stop_this_handle() with:
>> J_ASSERT(atomic_read(&transaction->t_updates) > 0);
>>
>> Crash stack:
>> Call trace:
>> stop_this_handle+0x148/0x158
>> jbd2_journal_stop+0x198/0x388
>> __ext4_journal_stop+0x70/0xf0
>> ext4_create+0x12c/0x188
>> ...
>>
>> From the vmcore dump, I found that handle->h_transaction->t_updates is
>> 0 and handle->h_transaction->t_state is T_FINISHED. This means the
>> handle is still bound to a transaction that has already completed.
>>
>> In the JBD2 commit process, once a transaction enters T_LOCKED state,
>> the commit thread waits for all associated handles to complete
>> (t_updates becomes 0). After T_LOCKED completes, the transaction
>> transitions to T_FLUSH, T_COMMIT, and eventually T_FINISHED. At this
>> point, t_updates is legitimately 0, and there should be no active
>> handles bound to this transaction.
>>
>> This appears to be a low-probability race condition that occurs under
>> high concurrency scenarios. The crash happened during ext4_create(),
I'm curious about what race condition causes this?
>> and at some rare timing point during the execution, a handle ended up
>> bound to a transaction that had already completed. The symptom is
>> clear: a handle is bound to a T_FINISHED transaction with t_updates == 0.
>>
>> To prevent this, I added a transaction state check in both
>> jbd2_journal_stop() and jbd2__journal_restart() before calling
>> stop_this_handle().
>> If the transaction is not in T_RUNNING or T_LOCKED state (i.e., it's
>> in T_FLUSH or later states), I clear handle->h_transaction, clear
>> current->journal_info if needed, restore the memalloc_nofs context,
>> and skip calling stop_this_handle(). This is a defensive check to
>> handle the edge case where a handle is bound to a transaction in an
>> invalid state.
>>
If this state occurs, there must be something wrong somewhere.
Assertions should also be added to detect such anomalies. I think
directly skipping the assertion check is not a good way to solve the
problem.
>> Please let me know if you have any questions or concerns.
>>
>> Best regards,
>> Baolin Liu
>
>
>
> 在 2026/3/5 20:57, Baolin Liu 写道:
>>
>> Add others
>>
>>
>>
>>
>>
>> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote:
>>> From: Baolin Liu <liubaolin@kylinos.cn>
>>>
>>> When a transaction enters T_FLUSH or later states,
>>> handle->h_transaction may still point to it.
>>> If jbd2_journal_stop() or jbd2__journal_restart() is called,
>>> stop_this_handle() checks t_updates > 0, but t_updates is
>>> already 0 for these states, causing a kernel BUG.
>>>
>>> Fix by checking transaction->t_state in jbd2_journal_stop()
>>> and jbd2__journal_restart() before calling stop_this_handle().
>>> If the transaction is not in T_RUNNING or T_LOCKED state,
>>> clear handle->h_transaction and skip stop_this_handle().
>>>
>>> Crash stack:
>>> Call trace:
>>> stop_this_handle+0x148/0x158
>>> jbd2_journal_stop+0x198/0x388
>>> __ext4_journal_stop+0x70/0xf0
>>> ext4_create+0x12c/0x188
>>> lookup_open+0x214/0x6d8
>>> do_last+0x364/0x878
>>> path_openat+0x6c/0x280
>>> do_filp_open+0x70/0xe8
>>> do_sys_open+0x178/0x200
>>> sys_openat+0x3c/0x50
>>> el0_svc_naked+0x44/0x48
>>>
>>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
>>> ---
>>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
>>> 1 file changed, 23 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>>> index dca4b5d8aaaa..3779382dbb80 100644
>>> --- a/fs/jbd2/transaction.c
>>> +++ b/fs/jbd2/transaction.c
>>> @@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int
>>> nblocks, int revoke_records,
>>> journal = transaction->t_journal;
>>> tid = transaction->t_tid;
>>>
>>> + jbd2_debug(2, "restarting handle %p\n", handle);
>>> +
>>> + /* Check if transaction is in invalid state */
>>> + if (transaction->t_state != T_RUNNING &&
>>> + transaction->t_state != T_LOCKED) {
>>> + if (current->journal_info == handle)
>>> + current->journal_info = NULL;
>>> + handle->h_transaction = NULL;
>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>> + goto skip_stop;
>>> + }
>>> +
>>> /*
>>> * First unlink the handle from its current transaction, and
>>> start the
>>> * commit on that.
>>> */
>>> - jbd2_debug(2, "restarting handle %p\n", handle);
>>> stop_this_handle(handle);
>>> handle->h_transaction = NULL;
>>> -
>>> +skip_stop:
>>> /*
>>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we
>>> can
>>> * get rid of pointless j_state_lock traffic like this.
>>> @@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
>>> memalloc_nofs_restore(handle->saved_alloc_context);
>>> goto free_and_exit;
>>> }
>>> + /* Check if transaction is in invalid state */
>>> + if (transaction->t_state != T_RUNNING &&
>>> + transaction->t_state != T_LOCKED) {
>>> + if (current->journal_info == handle)
>>> + current->journal_info = NULL;
>>> + handle->h_transaction = NULL;
>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>> + goto free_and_exit;
>>> + }
>>> +
>>> journal = transaction->t_journal;
>>> tid = transaction->t_tid;
>>>
>>> --
>>> 2.39.2
>
>
>
> .
>
next prev parent reply other threads:[~2026-03-06 1:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-05 12:54 [PATCH v1] jbd2: check transaction state before stopping handle Baolin Liu
[not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>
2026-03-05 13:08 ` Jan Kara
2026-03-06 9:19 ` liubaolin
2026-03-05 13:11 ` liubaolin
2026-03-06 1:20 ` yebin (H) [this message]
2026-03-06 10:00 ` liubaolin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69AA2BF5.3000403@huawei.com \
--to=yebin10@huawei.com \
--cc=jack@suse.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liubaolin12138@163.com \
--cc=liubaolin@kylinos.cn \
--cc=tytso@mit.edu \
--cc=wangguanyu@vivo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox