* [PATCH v1] jbd2: check transaction state before stopping handle
@ 2026-03-05 12:54 Baolin Liu
[not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>
0 siblings, 1 reply; 6+ messages in thread
From: Baolin Liu @ 2026-03-05 12:54 UTC (permalink / raw)
To: tytso, jack; +Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu
From: Baolin Liu <liubaolin@kylinos.cn>
When a transaction enters T_FLUSH or later states,
handle->h_transaction may still point to it.
If jbd2_journal_stop() or jbd2__journal_restart() is called,
stop_this_handle() checks t_updates > 0, but t_updates is
already 0 for these states, causing a kernel BUG.
Fix by checking transaction->t_state in jbd2_journal_stop()
and jbd2__journal_restart() before calling stop_this_handle().
If the transaction is not in T_RUNNING or T_LOCKED state,
clear handle->h_transaction and skip stop_this_handle().
Crash stack:
Call trace:
stop_this_handle+0x148/0x158
jbd2_journal_stop+0x198/0x388
__ext4_journal_stop+0x70/0xf0
ext4_create+0x12c/0x188
lookup_open+0x214/0x6d8
do_last+0x364/0x878
path_openat+0x6c/0x280
do_filp_open+0x70/0xe8
do_sys_open+0x178/0x200
sys_openat+0x3c/0x50
el0_svc_naked+0x44/0x48
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
---
fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index dca4b5d8aaaa..3779382dbb80 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records,
journal = transaction->t_journal;
tid = transaction->t_tid;
+ jbd2_debug(2, "restarting handle %p\n", handle);
+
+ /* Check if transaction is in invalid state */
+ if (transaction->t_state != T_RUNNING &&
+ transaction->t_state != T_LOCKED) {
+ if (current->journal_info == handle)
+ current->journal_info = NULL;
+ handle->h_transaction = NULL;
+ memalloc_nofs_restore(handle->saved_alloc_context);
+ goto skip_stop;
+ }
+
/*
* First unlink the handle from its current transaction, and start the
* commit on that.
*/
- jbd2_debug(2, "restarting handle %p\n", handle);
stop_this_handle(handle);
handle->h_transaction = NULL;
-
+skip_stop:
/*
* TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can
* get rid of pointless j_state_lock traffic like this.
@@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
memalloc_nofs_restore(handle->saved_alloc_context);
goto free_and_exit;
}
+ /* Check if transaction is in invalid state */
+ if (transaction->t_state != T_RUNNING &&
+ transaction->t_state != T_LOCKED) {
+ if (current->journal_info == handle)
+ current->journal_info = NULL;
+ handle->h_transaction = NULL;
+ memalloc_nofs_restore(handle->saved_alloc_context);
+ goto free_and_exit;
+ }
+
journal = transaction->t_journal;
tid = transaction->t_tid;
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle
[not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>
@ 2026-03-05 13:08 ` Jan Kara
2026-03-06 9:19 ` liubaolin
2026-03-05 13:11 ` liubaolin
1 sibling, 1 reply; 6+ messages in thread
From: Jan Kara @ 2026-03-05 13:08 UTC (permalink / raw)
To: Baolin Liu; +Cc: tytso, jack, linux-ext4, linux-kernel, wangguanyu, Baolin Liu
On Thu 05-03-26 20:57:18, Baolin Liu wrote:
> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote:
> >From: Baolin Liu <liubaolin@kylinos.cn>
> >
> >When a transaction enters T_FLUSH or later states,
> >handle->h_transaction may still point to it.
> >If jbd2_journal_stop() or jbd2__journal_restart() is called,
> >stop_this_handle() checks t_updates > 0, but t_updates is
> >already 0 for these states, causing a kernel BUG.
Which bug please? Do you mean
J_ASSERT(atomic_read(&transaction->t_updates) > 0)
? Anyway this just doesn't make sense. When stop_this_handle() the caller
is holding t_updates reference which stop_this_handle() is supposed to drop
and the transaction should never transition past T_LOCKED state. If you
have a handle that's pointing to a transaction past T_LOCKED state, there's
a bug somewhere and that bug needs to be fixed, not paper over it like you
do in this patch. More details about reproducer etc. would be useful.
Honza
> >
> >Fix by checking transaction->t_state in jbd2_journal_stop()
> >and jbd2__journal_restart() before calling stop_this_handle().
> >If the transaction is not in T_RUNNING or T_LOCKED state,
> >clear handle->h_transaction and skip stop_this_handle().
> >
> >Crash stack:
> > Call trace:
> > stop_this_handle+0x148/0x158
> > jbd2_journal_stop+0x198/0x388
> > __ext4_journal_stop+0x70/0xf0
> > ext4_create+0x12c/0x188
> > lookup_open+0x214/0x6d8
> > do_last+0x364/0x878
> > path_openat+0x6c/0x280
> > do_filp_open+0x70/0xe8
> > do_sys_open+0x178/0x200
> > sys_openat+0x3c/0x50
> > el0_svc_naked+0x44/0x48
> >
> >Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
> >---
> > fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
> > 1 file changed, 23 insertions(+), 2 deletions(-)
> >
> >diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> >index dca4b5d8aaaa..3779382dbb80 100644
> >--- a/fs/jbd2/transaction.c
> >+++ b/fs/jbd2/transaction.c
> >@@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records,
> > journal = transaction->t_journal;
> > tid = transaction->t_tid;
> >
> >+ jbd2_debug(2, "restarting handle %p\n", handle);
> >+
> >+ /* Check if transaction is in invalid state */
> >+ if (transaction->t_state != T_RUNNING &&
> >+ transaction->t_state != T_LOCKED) {
> >+ if (current->journal_info == handle)
> >+ current->journal_info = NULL;
> >+ handle->h_transaction = NULL;
> >+ memalloc_nofs_restore(handle->saved_alloc_context);
> >+ goto skip_stop;
> >+ }
> >+
> > /*
> > * First unlink the handle from its current transaction, and start the
> > * commit on that.
> > */
> >- jbd2_debug(2, "restarting handle %p\n", handle);
> > stop_this_handle(handle);
> > handle->h_transaction = NULL;
> >-
> >+skip_stop:
> > /*
> > * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can
> > * get rid of pointless j_state_lock traffic like this.
> >@@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
> > memalloc_nofs_restore(handle->saved_alloc_context);
> > goto free_and_exit;
> > }
> >+ /* Check if transaction is in invalid state */
> >+ if (transaction->t_state != T_RUNNING &&
> >+ transaction->t_state != T_LOCKED) {
> >+ if (current->journal_info == handle)
> >+ current->journal_info = NULL;
> >+ handle->h_transaction = NULL;
> >+ memalloc_nofs_restore(handle->saved_alloc_context);
> >+ goto free_and_exit;
> >+ }
> >+
> > journal = transaction->t_journal;
> > tid = transaction->t_tid;
> >
> >--
> >2.39.2
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle
[not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>
2026-03-05 13:08 ` Jan Kara
@ 2026-03-05 13:11 ` liubaolin
2026-03-06 1:20 ` yebin (H)
1 sibling, 1 reply; 6+ messages in thread
From: liubaolin @ 2026-03-05 13:11 UTC (permalink / raw)
To: tytso, jack
Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu, liubaolin12138
> Dear maintainers and community,
>
> Our customer reported a kernel crash issue. The kernel crashes in stop_this_handle() with:
> J_ASSERT(atomic_read(&transaction->t_updates) > 0);
>
> Crash stack:
> Call trace:
> stop_this_handle+0x148/0x158
> jbd2_journal_stop+0x198/0x388
> __ext4_journal_stop+0x70/0xf0
> ext4_create+0x12c/0x188
> ...
>
> From the vmcore dump, I found that handle->h_transaction->t_updates is 0 and handle->h_transaction->t_state is T_FINISHED.
> This means the handle is still bound to a transaction that has already completed.
>
> In the JBD2 commit process, once a transaction enters T_LOCKED state, the commit thread waits for all associated handles to complete (t_updates becomes 0).
> After T_LOCKED completes, the transaction transitions to T_FLUSH, T_COMMIT, and eventually T_FINISHED.
> At this point, t_updates is legitimately 0, and there should be no active handles bound to this transaction.
>
> This appears to be a low-probability race condition that occurs under high concurrency scenarios.
> The crash happened during ext4_create(), and at some rare timing point during the execution, a handle ended up bound to a transaction that had already completed.
> The symptom is clear: a handle is bound to a T_FINISHED transaction with t_updates == 0.
>
> To prevent this, I added a transaction state check in both jbd2_journal_stop() and jbd2__journal_restart() before calling stop_this_handle().
> If the transaction is not in T_RUNNING or T_LOCKED state (i.e., it's in T_FLUSH or later states),
> I clear handle->h_transaction, clear current->journal_info if needed, restore the memalloc_nofs context, and skip calling stop_this_handle().
> This is a defensive check to handle the edge case where a handle is bound to a transaction in an invalid state.
>
> Please let me know if you have any questions or concerns.
>
> Best regards,
> Baolin Liu
在 2026/3/5 20:57, Baolin Liu 写道:
>
> Add others
>
>
>
>
>
> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote:
>>From: Baolin Liu <liubaolin@kylinos.cn>
>>
>>When a transaction enters T_FLUSH or later states,
>>handle->h_transaction may still point to it.
>>If jbd2_journal_stop() or jbd2__journal_restart() is called,
>>stop_this_handle() checks t_updates > 0, but t_updates is
>>already 0 for these states, causing a kernel BUG.
>>
>>Fix by checking transaction->t_state in jbd2_journal_stop()
>>and jbd2__journal_restart() before calling stop_this_handle().
>>If the transaction is not in T_RUNNING or T_LOCKED state,
>>clear handle->h_transaction and skip stop_this_handle().
>>
>>Crash stack:
>> Call trace:
>> stop_this_handle+0x148/0x158
>> jbd2_journal_stop+0x198/0x388
>> __ext4_journal_stop+0x70/0xf0
>> ext4_create+0x12c/0x188
>> lookup_open+0x214/0x6d8
>> do_last+0x364/0x878
>> path_openat+0x6c/0x280
>> do_filp_open+0x70/0xe8
>> do_sys_open+0x178/0x200
>> sys_openat+0x3c/0x50
>> el0_svc_naked+0x44/0x48
>>
>>Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
>>---
>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
>> 1 file changed, 23 insertions(+), 2 deletions(-)
>>
>>diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>>index dca4b5d8aaaa..3779382dbb80 100644
>>--- a/fs/jbd2/transaction.c
>>+++ b/fs/jbd2/transaction.c
>>@@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records,
>> journal = transaction->t_journal;
>> tid = transaction->t_tid;
>>
>>+ jbd2_debug(2, "restarting handle %p\n", handle);
>>+
>>+ /* Check if transaction is in invalid state */
>>+ if (transaction->t_state != T_RUNNING &&
>>+ transaction->t_state != T_LOCKED) {
>>+ if (current->journal_info == handle)
>>+ current->journal_info = NULL;
>>+ handle->h_transaction = NULL;
>>+ memalloc_nofs_restore(handle->saved_alloc_context);
>>+ goto skip_stop;
>>+ }
>>+
>> /*
>> * First unlink the handle from its current transaction, and start the
>> * commit on that.
>> */
>>- jbd2_debug(2, "restarting handle %p\n", handle);
>> stop_this_handle(handle);
>> handle->h_transaction = NULL;
>>-
>>+skip_stop:
>> /*
>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can
>> * get rid of pointless j_state_lock traffic like this.
>>@@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
>> memalloc_nofs_restore(handle->saved_alloc_context);
>> goto free_and_exit;
>> }
>>+ /* Check if transaction is in invalid state */
>>+ if (transaction->t_state != T_RUNNING &&
>>+ transaction->t_state != T_LOCKED) {
>>+ if (current->journal_info == handle)
>>+ current->journal_info = NULL;
>>+ handle->h_transaction = NULL;
>>+ memalloc_nofs_restore(handle->saved_alloc_context);
>>+ goto free_and_exit;
>>+ }
>>+
>> journal = transaction->t_journal;
>> tid = transaction->t_tid;
>>
>>--
>>2.39.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle
2026-03-05 13:11 ` liubaolin
@ 2026-03-06 1:20 ` yebin (H)
2026-03-06 10:00 ` liubaolin
0 siblings, 1 reply; 6+ messages in thread
From: yebin (H) @ 2026-03-06 1:20 UTC (permalink / raw)
To: liubaolin, tytso, jack; +Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu
On 2026/3/5 21:11, liubaolin wrote:
>> Dear maintainers and community,
>>
>> Our customer reported a kernel crash issue. The kernel crashes in
>> stop_this_handle() with:
>> J_ASSERT(atomic_read(&transaction->t_updates) > 0);
>>
>> Crash stack:
>> Call trace:
>> stop_this_handle+0x148/0x158
>> jbd2_journal_stop+0x198/0x388
>> __ext4_journal_stop+0x70/0xf0
>> ext4_create+0x12c/0x188
>> ...
>>
>> From the vmcore dump, I found that handle->h_transaction->t_updates is
>> 0 and handle->h_transaction->t_state is T_FINISHED. This means the
>> handle is still bound to a transaction that has already completed.
>>
>> In the JBD2 commit process, once a transaction enters T_LOCKED state,
>> the commit thread waits for all associated handles to complete
>> (t_updates becomes 0). After T_LOCKED completes, the transaction
>> transitions to T_FLUSH, T_COMMIT, and eventually T_FINISHED. At this
>> point, t_updates is legitimately 0, and there should be no active
>> handles bound to this transaction.
>>
>> This appears to be a low-probability race condition that occurs under
>> high concurrency scenarios. The crash happened during ext4_create(),
I'm curious about what race condition causes this?
>> and at some rare timing point during the execution, a handle ended up
>> bound to a transaction that had already completed. The symptom is
>> clear: a handle is bound to a T_FINISHED transaction with t_updates == 0.
>>
>> To prevent this, I added a transaction state check in both
>> jbd2_journal_stop() and jbd2__journal_restart() before calling
>> stop_this_handle().
>> If the transaction is not in T_RUNNING or T_LOCKED state (i.e., it's
>> in T_FLUSH or later states), I clear handle->h_transaction, clear
>> current->journal_info if needed, restore the memalloc_nofs context,
>> and skip calling stop_this_handle(). This is a defensive check to
>> handle the edge case where a handle is bound to a transaction in an
>> invalid state.
>>
If this state occurs, there must be something wrong somewhere.
Assertions should also be added to detect such anomalies. I think
directly skipping the assertion check is not a good way to solve the
problem.
>> Please let me know if you have any questions or concerns.
>>
>> Best regards,
>> Baolin Liu
>
>
>
> 在 2026/3/5 20:57, Baolin Liu 写道:
>>
>> Add others
>>
>>
>>
>>
>>
>> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote:
>>> From: Baolin Liu <liubaolin@kylinos.cn>
>>>
>>> When a transaction enters T_FLUSH or later states,
>>> handle->h_transaction may still point to it.
>>> If jbd2_journal_stop() or jbd2__journal_restart() is called,
>>> stop_this_handle() checks t_updates > 0, but t_updates is
>>> already 0 for these states, causing a kernel BUG.
>>>
>>> Fix by checking transaction->t_state in jbd2_journal_stop()
>>> and jbd2__journal_restart() before calling stop_this_handle().
>>> If the transaction is not in T_RUNNING or T_LOCKED state,
>>> clear handle->h_transaction and skip stop_this_handle().
>>>
>>> Crash stack:
>>> Call trace:
>>> stop_this_handle+0x148/0x158
>>> jbd2_journal_stop+0x198/0x388
>>> __ext4_journal_stop+0x70/0xf0
>>> ext4_create+0x12c/0x188
>>> lookup_open+0x214/0x6d8
>>> do_last+0x364/0x878
>>> path_openat+0x6c/0x280
>>> do_filp_open+0x70/0xe8
>>> do_sys_open+0x178/0x200
>>> sys_openat+0x3c/0x50
>>> el0_svc_naked+0x44/0x48
>>>
>>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
>>> ---
>>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
>>> 1 file changed, 23 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>>> index dca4b5d8aaaa..3779382dbb80 100644
>>> --- a/fs/jbd2/transaction.c
>>> +++ b/fs/jbd2/transaction.c
>>> @@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int
>>> nblocks, int revoke_records,
>>> journal = transaction->t_journal;
>>> tid = transaction->t_tid;
>>>
>>> + jbd2_debug(2, "restarting handle %p\n", handle);
>>> +
>>> + /* Check if transaction is in invalid state */
>>> + if (transaction->t_state != T_RUNNING &&
>>> + transaction->t_state != T_LOCKED) {
>>> + if (current->journal_info == handle)
>>> + current->journal_info = NULL;
>>> + handle->h_transaction = NULL;
>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>> + goto skip_stop;
>>> + }
>>> +
>>> /*
>>> * First unlink the handle from its current transaction, and
>>> start the
>>> * commit on that.
>>> */
>>> - jbd2_debug(2, "restarting handle %p\n", handle);
>>> stop_this_handle(handle);
>>> handle->h_transaction = NULL;
>>> -
>>> +skip_stop:
>>> /*
>>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we
>>> can
>>> * get rid of pointless j_state_lock traffic like this.
>>> @@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
>>> memalloc_nofs_restore(handle->saved_alloc_context);
>>> goto free_and_exit;
>>> }
>>> + /* Check if transaction is in invalid state */
>>> + if (transaction->t_state != T_RUNNING &&
>>> + transaction->t_state != T_LOCKED) {
>>> + if (current->journal_info == handle)
>>> + current->journal_info = NULL;
>>> + handle->h_transaction = NULL;
>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>> + goto free_and_exit;
>>> + }
>>> +
>>> journal = transaction->t_journal;
>>> tid = transaction->t_tid;
>>>
>>> --
>>> 2.39.2
>
>
>
> .
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle
2026-03-05 13:08 ` Jan Kara
@ 2026-03-06 9:19 ` liubaolin
0 siblings, 0 replies; 6+ messages in thread
From: liubaolin @ 2026-03-06 9:19 UTC (permalink / raw)
To: Jan Kara; +Cc: tytso, jack, linux-ext4, linux-kernel, wangguanyu, Baolin Liu
Dear Honza,
Thank you for your feedback. Yes, the BUG is exactly at:
J_ASSERT(atomic_read(&transaction->t_updates) > 0)
> Dear Honza,
>
> Thank you for your feedback. Yes, the BUG is exactly at:
> J_ASSERT(atomic_read(&transaction->t_updates) > 0)
>
> I understand your concern that this shouldn't happen in normal operation. However, from the vmcore dump, we can confirm that:
> - handle->h_transaction->t_updates == 0
> - handle->h_transaction->t_state == T_FINISHED
> - The crash occurred in stop_this_handle() when it tried to assert t_updates > 0
>
> This is a real crash that happened in production. The crash stack shows it occurred during ext4_create(), and the process name was MemTableFlushTh.
>
> I understand you want us to find and fix the root cause. I've reviewed the code but haven't found an obvious bug from the code that would cause this issue.
> However, I suspect the issue might be related to credit exhaustion during ext4_create() that triggers a handle restart,
> but this is just speculation and I'm still studying the code to confirm.
>
> This crash only occurred once in production and we haven't been able to reproduce it. The crash happened under high concurrency, and appears to be timing-dependent.
> I will continue to investigate the code and try to find a way to reproduce this issue to identify the root cause.
>
> The patch I proposed is a defensive check to prevent the kernel BUG when this edge case occurs.
>
> If you have any thoughts on where to look for the root cause, I'd really appreciate any suggestions.
>
> Best regards,
> Baolin Liu
> Dear Honza,
>
> Thank you for your feedback. Yes, the BUG is exactly at:
> J_ASSERT(atomic_read(&transaction->t_updates) > 0)
>
> I understand your concern that this shouldn't happen in normal operation. However, from the vmcore dump, we can confirm that:
> - handle->h_transaction->t_updates == 0
> - handle->h_transaction->t_state == T_FINISHED
> - The crash occurred in stop_this_handle() when it tried to assert t_updates > 0
>
> This is a real crash that happened in production. The crash stack shows it occurred during ext4_create(), and the process name was MemTableFlushTh.
>
> I understand you want us to find and fix the root cause. I've reviewed the code but haven't found an obvious bug from the code that would cause this issue.
> However, I suspect the issue might be related to credit exhaustion during ext4_create() that triggers a handle restart,
> but this is just speculation and I'm still studying the code to confirm.
>
> This crash only occurred once in production and we haven't been able to reproduce it. The crash happened under high concurrency, and appears to be timing-dependent.
> I will continue to investigate the code and try to find a way to reproduce this issue to identify the root cause.
>
> The patch I proposed is a defensive check to prevent the kernel BUG when this edge case occurs.
>
> If you have any thoughts on where to look for the root cause, I'd really appreciate any suggestions.
>
> Best regards,
> Baolin Liu
在 2026/3/5 21:08, Jan Kara 写道:
> On Thu 05-03-26 20:57:18, Baolin Liu wrote:
>> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote:
>>> From: Baolin Liu <liubaolin@kylinos.cn>
>>>
>>> When a transaction enters T_FLUSH or later states,
>>> handle->h_transaction may still point to it.
>>> If jbd2_journal_stop() or jbd2__journal_restart() is called,
>>> stop_this_handle() checks t_updates > 0, but t_updates is
>>> already 0 for these states, causing a kernel BUG.
>
> Which bug please? Do you mean
>
> J_ASSERT(atomic_read(&transaction->t_updates) > 0)
>
> ? Anyway this just doesn't make sense. When stop_this_handle() the caller
> is holding t_updates reference which stop_this_handle() is supposed to drop
> and the transaction should never transition past T_LOCKED state. If you
> have a handle that's pointing to a transaction past T_LOCKED state, there's
> a bug somewhere and that bug needs to be fixed, not paper over it like you
> do in this patch. More details about reproducer etc. would be useful.
>
> Honza
>
>>>
>>> Fix by checking transaction->t_state in jbd2_journal_stop()
>>> and jbd2__journal_restart() before calling stop_this_handle().
>>> If the transaction is not in T_RUNNING or T_LOCKED state,
>>> clear handle->h_transaction and skip stop_this_handle().
>>>
>>> Crash stack:
>>> Call trace:
>>> stop_this_handle+0x148/0x158
>>> jbd2_journal_stop+0x198/0x388
>>> __ext4_journal_stop+0x70/0xf0
>>> ext4_create+0x12c/0x188
>>> lookup_open+0x214/0x6d8
>>> do_last+0x364/0x878
>>> path_openat+0x6c/0x280
>>> do_filp_open+0x70/0xe8
>>> do_sys_open+0x178/0x200
>>> sys_openat+0x3c/0x50
>>> el0_svc_naked+0x44/0x48
>>>
>>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
>>> ---
>>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
>>> 1 file changed, 23 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>>> index dca4b5d8aaaa..3779382dbb80 100644
>>> --- a/fs/jbd2/transaction.c
>>> +++ b/fs/jbd2/transaction.c
>>> @@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records,
>>> journal = transaction->t_journal;
>>> tid = transaction->t_tid;
>>>
>>> + jbd2_debug(2, "restarting handle %p\n", handle);
>>> +
>>> + /* Check if transaction is in invalid state */
>>> + if (transaction->t_state != T_RUNNING &&
>>> + transaction->t_state != T_LOCKED) {
>>> + if (current->journal_info == handle)
>>> + current->journal_info = NULL;
>>> + handle->h_transaction = NULL;
>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>> + goto skip_stop;
>>> + }
>>> +
>>> /*
>>> * First unlink the handle from its current transaction, and start the
>>> * commit on that.
>>> */
>>> - jbd2_debug(2, "restarting handle %p\n", handle);
>>> stop_this_handle(handle);
>>> handle->h_transaction = NULL;
>>> -
>>> +skip_stop:
>>> /*
>>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can
>>> * get rid of pointless j_state_lock traffic like this.
>>> @@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
>>> memalloc_nofs_restore(handle->saved_alloc_context);
>>> goto free_and_exit;
>>> }
>>> + /* Check if transaction is in invalid state */
>>> + if (transaction->t_state != T_RUNNING &&
>>> + transaction->t_state != T_LOCKED) {
>>> + if (current->journal_info == handle)
>>> + current->journal_info = NULL;
>>> + handle->h_transaction = NULL;
>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>> + goto free_and_exit;
>>> + }
>>> +
>>> journal = transaction->t_journal;
>>> tid = transaction->t_tid;
>>>
>>> --
>>> 2.39.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle
2026-03-06 1:20 ` yebin (H)
@ 2026-03-06 10:00 ` liubaolin
0 siblings, 0 replies; 6+ messages in thread
From: liubaolin @ 2026-03-06 10:00 UTC (permalink / raw)
To: yebin (H), tytso, jack; +Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu
> Dear yebin,
>
> Thank you for your feedback.
>
> From the crash, we can see this issue was triggered by the MemTableFlushTh process.
> I agree with your point that if this state occurs, there must be something wrong somewhere.
> Once we find and fix the root cause, we should add J_ASSERT((transaction->t_state == T_RUNNING) || (transaction->t_state == T_LOCKED)) to detect such anomalies, rather than directly skipping the assertion check.
>
> I will continue to investigate the root cause and try to find a way to reproduce this issue.
> If you have any thoughts or suggestions on this problem, I'd be happy to discuss.
>
> Best regards,
> Baolin Liu
在 2026/3/6 9:20, yebin (H) 写道:
>
>
> On 2026/3/5 21:11, liubaolin wrote:
>>> Dear maintainers and community,
>>>
>>> Our customer reported a kernel crash issue. The kernel crashes in
>>> stop_this_handle() with:
>>> J_ASSERT(atomic_read(&transaction->t_updates) > 0);
>>>
>>> Crash stack:
>>> Call trace:
>>> stop_this_handle+0x148/0x158
>>> jbd2_journal_stop+0x198/0x388
>>> __ext4_journal_stop+0x70/0xf0
>>> ext4_create+0x12c/0x188
>>> ...
>>>
>>> From the vmcore dump, I found that handle->h_transaction->t_updates is
>>> 0 and handle->h_transaction->t_state is T_FINISHED. This means the
>>> handle is still bound to a transaction that has already completed.
>>>
>>> In the JBD2 commit process, once a transaction enters T_LOCKED state,
>>> the commit thread waits for all associated handles to complete
>>> (t_updates becomes 0). After T_LOCKED completes, the transaction
>>> transitions to T_FLUSH, T_COMMIT, and eventually T_FINISHED. At this
>>> point, t_updates is legitimately 0, and there should be no active
>>> handles bound to this transaction.
>>>
>>> This appears to be a low-probability race condition that occurs under
>>> high concurrency scenarios. The crash happened during ext4_create(),
> I'm curious about what race condition causes this?
>
>>> and at some rare timing point during the execution, a handle ended up
>>> bound to a transaction that had already completed. The symptom is
>>> clear: a handle is bound to a T_FINISHED transaction with t_updates
>>> == 0.
>>>
>>> To prevent this, I added a transaction state check in both
>>> jbd2_journal_stop() and jbd2__journal_restart() before calling
>>> stop_this_handle().
>>> If the transaction is not in T_RUNNING or T_LOCKED state (i.e., it's
>>> in T_FLUSH or later states), I clear handle->h_transaction, clear
>>> current->journal_info if needed, restore the memalloc_nofs context,
>>> and skip calling stop_this_handle(). This is a defensive check to
>>> handle the edge case where a handle is bound to a transaction in an
>>> invalid state.
>>>
> If this state occurs, there must be something wrong somewhere.
> Assertions should also be added to detect such anomalies. I think
> directly skipping the assertion check is not a good way to solve the
> problem.
>>> Please let me know if you have any questions or concerns.
>>>
>>> Best regards,
>>> Baolin Liu
>>
>>
>>
>> 在 2026/3/5 20:57, Baolin Liu 写道:
>>>
>>> Add others
>>>
>>>
>>>
>>>
>>>
>>> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote:
>>>> From: Baolin Liu <liubaolin@kylinos.cn>
>>>>
>>>> When a transaction enters T_FLUSH or later states,
>>>> handle->h_transaction may still point to it.
>>>> If jbd2_journal_stop() or jbd2__journal_restart() is called,
>>>> stop_this_handle() checks t_updates > 0, but t_updates is
>>>> already 0 for these states, causing a kernel BUG.
>>>>
>>>> Fix by checking transaction->t_state in jbd2_journal_stop()
>>>> and jbd2__journal_restart() before calling stop_this_handle().
>>>> If the transaction is not in T_RUNNING or T_LOCKED state,
>>>> clear handle->h_transaction and skip stop_this_handle().
>>>>
>>>> Crash stack:
>>>> Call trace:
>>>> stop_this_handle+0x148/0x158
>>>> jbd2_journal_stop+0x198/0x388
>>>> __ext4_journal_stop+0x70/0xf0
>>>> ext4_create+0x12c/0x188
>>>> lookup_open+0x214/0x6d8
>>>> do_last+0x364/0x878
>>>> path_openat+0x6c/0x280
>>>> do_filp_open+0x70/0xe8
>>>> do_sys_open+0x178/0x200
>>>> sys_openat+0x3c/0x50
>>>> el0_svc_naked+0x44/0x48
>>>>
>>>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
>>>> ---
>>>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
>>>> 1 file changed, 23 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>>>> index dca4b5d8aaaa..3779382dbb80 100644
>>>> --- a/fs/jbd2/transaction.c
>>>> +++ b/fs/jbd2/transaction.c
>>>> @@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int
>>>> nblocks, int revoke_records,
>>>> journal = transaction->t_journal;
>>>> tid = transaction->t_tid;
>>>>
>>>> + jbd2_debug(2, "restarting handle %p\n", handle);
>>>> +
>>>> + /* Check if transaction is in invalid state */
>>>> + if (transaction->t_state != T_RUNNING &&
>>>> + transaction->t_state != T_LOCKED) {
>>>> + if (current->journal_info == handle)
>>>> + current->journal_info = NULL;
>>>> + handle->h_transaction = NULL;
>>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>>> + goto skip_stop;
>>>> + }
>>>> +
>>>> /*
>>>> * First unlink the handle from its current transaction, and
>>>> start the
>>>> * commit on that.
>>>> */
>>>> - jbd2_debug(2, "restarting handle %p\n", handle);
>>>> stop_this_handle(handle);
>>>> handle->h_transaction = NULL;
>>>> -
>>>> +skip_stop:
>>>> /*
>>>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we
>>>> can
>>>> * get rid of pointless j_state_lock traffic like this.
>>>> @@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
>>>> memalloc_nofs_restore(handle->saved_alloc_context);
>>>> goto free_and_exit;
>>>> }
>>>> + /* Check if transaction is in invalid state */
>>>> + if (transaction->t_state != T_RUNNING &&
>>>> + transaction->t_state != T_LOCKED) {
>>>> + if (current->journal_info == handle)
>>>> + current->journal_info = NULL;
>>>> + handle->h_transaction = NULL;
>>>> + memalloc_nofs_restore(handle->saved_alloc_context);
>>>> + goto free_and_exit;
>>>> + }
>>>> +
>>>> journal = transaction->t_journal;
>>>> tid = transaction->t_tid;
>>>>
>>>> --
>>>> 2.39.2
>>
>>
>>
>> .
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-06 10:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-05 12:54 [PATCH v1] jbd2: check transaction state before stopping handle Baolin Liu
[not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>
2026-03-05 13:08 ` Jan Kara
2026-03-06 9:19 ` liubaolin
2026-03-05 13:11 ` liubaolin
2026-03-06 1:20 ` yebin (H)
2026-03-06 10:00 ` liubaolin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox