* [PATCH v1] jbd2: check transaction state before stopping handle
@ 2026-03-05 12:54 Baolin Liu
[not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>
0 siblings, 1 reply; 6+ messages in thread
From: Baolin Liu @ 2026-03-05 12:54 UTC (permalink / raw)
To: tytso, jack; +Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu
From: Baolin Liu <liubaolin@kylinos.cn>
When a transaction enters T_FLUSH or later states,
handle->h_transaction may still point to it.
If jbd2_journal_stop() or jbd2__journal_restart() is called,
stop_this_handle() checks t_updates > 0, but t_updates is
already 0 for these states, causing a kernel BUG.
Fix by checking transaction->t_state in jbd2_journal_stop()
and jbd2__journal_restart() before calling stop_this_handle().
If the transaction is not in T_RUNNING or T_LOCKED state,
clear handle->h_transaction and skip stop_this_handle().
Crash stack:
Call trace:
stop_this_handle+0x148/0x158
jbd2_journal_stop+0x198/0x388
__ext4_journal_stop+0x70/0xf0
ext4_create+0x12c/0x188
lookup_open+0x214/0x6d8
do_last+0x364/0x878
path_openat+0x6c/0x280
do_filp_open+0x70/0xe8
do_sys_open+0x178/0x200
sys_openat+0x3c/0x50
el0_svc_naked+0x44/0x48
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
---
fs/jbd2/transaction.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index dca4b5d8aaaa..3779382dbb80 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records,
journal = transaction->t_journal;
tid = transaction->t_tid;
+ jbd2_debug(2, "restarting handle %p\n", handle);
+
+ /* Check if transaction is in invalid state */
+ if (transaction->t_state != T_RUNNING &&
+ transaction->t_state != T_LOCKED) {
+ if (current->journal_info == handle)
+ current->journal_info = NULL;
+ handle->h_transaction = NULL;
+ memalloc_nofs_restore(handle->saved_alloc_context);
+ goto skip_stop;
+ }
+
/*
* First unlink the handle from its current transaction, and start the
* commit on that.
*/
- jbd2_debug(2, "restarting handle %p\n", handle);
stop_this_handle(handle);
handle->h_transaction = NULL;
-
+skip_stop:
/*
* TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can
* get rid of pointless j_state_lock traffic like this.
@@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle)
memalloc_nofs_restore(handle->saved_alloc_context);
goto free_and_exit;
}
+ /* Check if transaction is in invalid state */
+ if (transaction->t_state != T_RUNNING &&
+ transaction->t_state != T_LOCKED) {
+ if (current->journal_info == handle)
+ current->journal_info = NULL;
+ handle->h_transaction = NULL;
+ memalloc_nofs_restore(handle->saved_alloc_context);
+ goto free_and_exit;
+ }
+
journal = transaction->t_journal;
tid = transaction->t_tid;
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread[parent not found: <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>]
* Re: [PATCH v1] jbd2: check transaction state before stopping handle [not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com> @ 2026-03-05 13:08 ` Jan Kara 2026-03-06 9:19 ` liubaolin 2026-03-05 13:11 ` liubaolin 1 sibling, 1 reply; 6+ messages in thread From: Jan Kara @ 2026-03-05 13:08 UTC (permalink / raw) To: Baolin Liu; +Cc: tytso, jack, linux-ext4, linux-kernel, wangguanyu, Baolin Liu On Thu 05-03-26 20:57:18, Baolin Liu wrote: > At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote: > >From: Baolin Liu <liubaolin@kylinos.cn> > > > >When a transaction enters T_FLUSH or later states, > >handle->h_transaction may still point to it. > >If jbd2_journal_stop() or jbd2__journal_restart() is called, > >stop_this_handle() checks t_updates > 0, but t_updates is > >already 0 for these states, causing a kernel BUG. Which bug please? Do you mean J_ASSERT(atomic_read(&transaction->t_updates) > 0) ? Anyway this just doesn't make sense. When stop_this_handle() the caller is holding t_updates reference which stop_this_handle() is supposed to drop and the transaction should never transition past T_LOCKED state. If you have a handle that's pointing to a transaction past T_LOCKED state, there's a bug somewhere and that bug needs to be fixed, not paper over it like you do in this patch. More details about reproducer etc. would be useful. Honza > > > >Fix by checking transaction->t_state in jbd2_journal_stop() > >and jbd2__journal_restart() before calling stop_this_handle(). > >If the transaction is not in T_RUNNING or T_LOCKED state, > >clear handle->h_transaction and skip stop_this_handle(). > > > >Crash stack: > > Call trace: > > stop_this_handle+0x148/0x158 > > jbd2_journal_stop+0x198/0x388 > > __ext4_journal_stop+0x70/0xf0 > > ext4_create+0x12c/0x188 > > lookup_open+0x214/0x6d8 > > do_last+0x364/0x878 > > path_openat+0x6c/0x280 > > do_filp_open+0x70/0xe8 > > do_sys_open+0x178/0x200 > > sys_openat+0x3c/0x50 > > el0_svc_naked+0x44/0x48 > > > >Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> > >--- > > fs/jbd2/transaction.c | 25 +++++++++++++++++++++++-- > > 1 file changed, 23 insertions(+), 2 deletions(-) > > > >diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c > >index dca4b5d8aaaa..3779382dbb80 100644 > >--- a/fs/jbd2/transaction.c > >+++ b/fs/jbd2/transaction.c > >@@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records, > > journal = transaction->t_journal; > > tid = transaction->t_tid; > > > >+ jbd2_debug(2, "restarting handle %p\n", handle); > >+ > >+ /* Check if transaction is in invalid state */ > >+ if (transaction->t_state != T_RUNNING && > >+ transaction->t_state != T_LOCKED) { > >+ if (current->journal_info == handle) > >+ current->journal_info = NULL; > >+ handle->h_transaction = NULL; > >+ memalloc_nofs_restore(handle->saved_alloc_context); > >+ goto skip_stop; > >+ } > >+ > > /* > > * First unlink the handle from its current transaction, and start the > > * commit on that. > > */ > >- jbd2_debug(2, "restarting handle %p\n", handle); > > stop_this_handle(handle); > > handle->h_transaction = NULL; > >- > >+skip_stop: > > /* > > * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can > > * get rid of pointless j_state_lock traffic like this. > >@@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle) > > memalloc_nofs_restore(handle->saved_alloc_context); > > goto free_and_exit; > > } > >+ /* Check if transaction is in invalid state */ > >+ if (transaction->t_state != T_RUNNING && > >+ transaction->t_state != T_LOCKED) { > >+ if (current->journal_info == handle) > >+ current->journal_info = NULL; > >+ handle->h_transaction = NULL; > >+ memalloc_nofs_restore(handle->saved_alloc_context); > >+ goto free_and_exit; > >+ } > >+ > > journal = transaction->t_journal; > > tid = transaction->t_tid; > > > >-- > >2.39.2 -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle 2026-03-05 13:08 ` Jan Kara @ 2026-03-06 9:19 ` liubaolin 0 siblings, 0 replies; 6+ messages in thread From: liubaolin @ 2026-03-06 9:19 UTC (permalink / raw) To: Jan Kara; +Cc: tytso, jack, linux-ext4, linux-kernel, wangguanyu, Baolin Liu Dear Honza, Thank you for your feedback. Yes, the BUG is exactly at: J_ASSERT(atomic_read(&transaction->t_updates) > 0) > Dear Honza, > > Thank you for your feedback. Yes, the BUG is exactly at: > J_ASSERT(atomic_read(&transaction->t_updates) > 0) > > I understand your concern that this shouldn't happen in normal operation. However, from the vmcore dump, we can confirm that: > - handle->h_transaction->t_updates == 0 > - handle->h_transaction->t_state == T_FINISHED > - The crash occurred in stop_this_handle() when it tried to assert t_updates > 0 > > This is a real crash that happened in production. The crash stack shows it occurred during ext4_create(), and the process name was MemTableFlushTh. > > I understand you want us to find and fix the root cause. I've reviewed the code but haven't found an obvious bug from the code that would cause this issue. > However, I suspect the issue might be related to credit exhaustion during ext4_create() that triggers a handle restart, > but this is just speculation and I'm still studying the code to confirm. > > This crash only occurred once in production and we haven't been able to reproduce it. The crash happened under high concurrency, and appears to be timing-dependent. > I will continue to investigate the code and try to find a way to reproduce this issue to identify the root cause. > > The patch I proposed is a defensive check to prevent the kernel BUG when this edge case occurs. > > If you have any thoughts on where to look for the root cause, I'd really appreciate any suggestions. > > Best regards, > Baolin Liu > Dear Honza, > > Thank you for your feedback. Yes, the BUG is exactly at: > J_ASSERT(atomic_read(&transaction->t_updates) > 0) > > I understand your concern that this shouldn't happen in normal operation. However, from the vmcore dump, we can confirm that: > - handle->h_transaction->t_updates == 0 > - handle->h_transaction->t_state == T_FINISHED > - The crash occurred in stop_this_handle() when it tried to assert t_updates > 0 > > This is a real crash that happened in production. The crash stack shows it occurred during ext4_create(), and the process name was MemTableFlushTh. > > I understand you want us to find and fix the root cause. I've reviewed the code but haven't found an obvious bug from the code that would cause this issue. > However, I suspect the issue might be related to credit exhaustion during ext4_create() that triggers a handle restart, > but this is just speculation and I'm still studying the code to confirm. > > This crash only occurred once in production and we haven't been able to reproduce it. The crash happened under high concurrency, and appears to be timing-dependent. > I will continue to investigate the code and try to find a way to reproduce this issue to identify the root cause. > > The patch I proposed is a defensive check to prevent the kernel BUG when this edge case occurs. > > If you have any thoughts on where to look for the root cause, I'd really appreciate any suggestions. > > Best regards, > Baolin Liu 在 2026/3/5 21:08, Jan Kara 写道: > On Thu 05-03-26 20:57:18, Baolin Liu wrote: >> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote: >>> From: Baolin Liu <liubaolin@kylinos.cn> >>> >>> When a transaction enters T_FLUSH or later states, >>> handle->h_transaction may still point to it. >>> If jbd2_journal_stop() or jbd2__journal_restart() is called, >>> stop_this_handle() checks t_updates > 0, but t_updates is >>> already 0 for these states, causing a kernel BUG. > > Which bug please? Do you mean > > J_ASSERT(atomic_read(&transaction->t_updates) > 0) > > ? Anyway this just doesn't make sense. When stop_this_handle() the caller > is holding t_updates reference which stop_this_handle() is supposed to drop > and the transaction should never transition past T_LOCKED state. If you > have a handle that's pointing to a transaction past T_LOCKED state, there's > a bug somewhere and that bug needs to be fixed, not paper over it like you > do in this patch. More details about reproducer etc. would be useful. > > Honza > >>> >>> Fix by checking transaction->t_state in jbd2_journal_stop() >>> and jbd2__journal_restart() before calling stop_this_handle(). >>> If the transaction is not in T_RUNNING or T_LOCKED state, >>> clear handle->h_transaction and skip stop_this_handle(). >>> >>> Crash stack: >>> Call trace: >>> stop_this_handle+0x148/0x158 >>> jbd2_journal_stop+0x198/0x388 >>> __ext4_journal_stop+0x70/0xf0 >>> ext4_create+0x12c/0x188 >>> lookup_open+0x214/0x6d8 >>> do_last+0x364/0x878 >>> path_openat+0x6c/0x280 >>> do_filp_open+0x70/0xe8 >>> do_sys_open+0x178/0x200 >>> sys_openat+0x3c/0x50 >>> el0_svc_naked+0x44/0x48 >>> >>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >>> --- >>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++-- >>> 1 file changed, 23 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c >>> index dca4b5d8aaaa..3779382dbb80 100644 >>> --- a/fs/jbd2/transaction.c >>> +++ b/fs/jbd2/transaction.c >>> @@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records, >>> journal = transaction->t_journal; >>> tid = transaction->t_tid; >>> >>> + jbd2_debug(2, "restarting handle %p\n", handle); >>> + >>> + /* Check if transaction is in invalid state */ >>> + if (transaction->t_state != T_RUNNING && >>> + transaction->t_state != T_LOCKED) { >>> + if (current->journal_info == handle) >>> + current->journal_info = NULL; >>> + handle->h_transaction = NULL; >>> + memalloc_nofs_restore(handle->saved_alloc_context); >>> + goto skip_stop; >>> + } >>> + >>> /* >>> * First unlink the handle from its current transaction, and start the >>> * commit on that. >>> */ >>> - jbd2_debug(2, "restarting handle %p\n", handle); >>> stop_this_handle(handle); >>> handle->h_transaction = NULL; >>> - >>> +skip_stop: >>> /* >>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can >>> * get rid of pointless j_state_lock traffic like this. >>> @@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle) >>> memalloc_nofs_restore(handle->saved_alloc_context); >>> goto free_and_exit; >>> } >>> + /* Check if transaction is in invalid state */ >>> + if (transaction->t_state != T_RUNNING && >>> + transaction->t_state != T_LOCKED) { >>> + if (current->journal_info == handle) >>> + current->journal_info = NULL; >>> + handle->h_transaction = NULL; >>> + memalloc_nofs_restore(handle->saved_alloc_context); >>> + goto free_and_exit; >>> + } >>> + >>> journal = transaction->t_journal; >>> tid = transaction->t_tid; >>> >>> -- >>> 2.39.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle [not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com> 2026-03-05 13:08 ` Jan Kara @ 2026-03-05 13:11 ` liubaolin 2026-03-06 1:20 ` yebin (H) 1 sibling, 1 reply; 6+ messages in thread From: liubaolin @ 2026-03-05 13:11 UTC (permalink / raw) To: tytso, jack Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu, liubaolin12138 > Dear maintainers and community, > > Our customer reported a kernel crash issue. The kernel crashes in stop_this_handle() with: > J_ASSERT(atomic_read(&transaction->t_updates) > 0); > > Crash stack: > Call trace: > stop_this_handle+0x148/0x158 > jbd2_journal_stop+0x198/0x388 > __ext4_journal_stop+0x70/0xf0 > ext4_create+0x12c/0x188 > ... > > From the vmcore dump, I found that handle->h_transaction->t_updates is 0 and handle->h_transaction->t_state is T_FINISHED. > This means the handle is still bound to a transaction that has already completed. > > In the JBD2 commit process, once a transaction enters T_LOCKED state, the commit thread waits for all associated handles to complete (t_updates becomes 0). > After T_LOCKED completes, the transaction transitions to T_FLUSH, T_COMMIT, and eventually T_FINISHED. > At this point, t_updates is legitimately 0, and there should be no active handles bound to this transaction. > > This appears to be a low-probability race condition that occurs under high concurrency scenarios. > The crash happened during ext4_create(), and at some rare timing point during the execution, a handle ended up bound to a transaction that had already completed. > The symptom is clear: a handle is bound to a T_FINISHED transaction with t_updates == 0. > > To prevent this, I added a transaction state check in both jbd2_journal_stop() and jbd2__journal_restart() before calling stop_this_handle(). > If the transaction is not in T_RUNNING or T_LOCKED state (i.e., it's in T_FLUSH or later states), > I clear handle->h_transaction, clear current->journal_info if needed, restore the memalloc_nofs context, and skip calling stop_this_handle(). > This is a defensive check to handle the edge case where a handle is bound to a transaction in an invalid state. > > Please let me know if you have any questions or concerns. > > Best regards, > Baolin Liu 在 2026/3/5 20:57, Baolin Liu 写道: > > Add others > > > > > > At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote: >>From: Baolin Liu <liubaolin@kylinos.cn> >> >>When a transaction enters T_FLUSH or later states, >>handle->h_transaction may still point to it. >>If jbd2_journal_stop() or jbd2__journal_restart() is called, >>stop_this_handle() checks t_updates > 0, but t_updates is >>already 0 for these states, causing a kernel BUG. >> >>Fix by checking transaction->t_state in jbd2_journal_stop() >>and jbd2__journal_restart() before calling stop_this_handle(). >>If the transaction is not in T_RUNNING or T_LOCKED state, >>clear handle->h_transaction and skip stop_this_handle(). >> >>Crash stack: >> Call trace: >> stop_this_handle+0x148/0x158 >> jbd2_journal_stop+0x198/0x388 >> __ext4_journal_stop+0x70/0xf0 >> ext4_create+0x12c/0x188 >> lookup_open+0x214/0x6d8 >> do_last+0x364/0x878 >> path_openat+0x6c/0x280 >> do_filp_open+0x70/0xe8 >> do_sys_open+0x178/0x200 >> sys_openat+0x3c/0x50 >> el0_svc_naked+0x44/0x48 >> >>Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >>--- >> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++-- >> 1 file changed, 23 insertions(+), 2 deletions(-) >> >>diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c >>index dca4b5d8aaaa..3779382dbb80 100644 >>--- a/fs/jbd2/transaction.c >>+++ b/fs/jbd2/transaction.c >>@@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, int revoke_records, >> journal = transaction->t_journal; >> tid = transaction->t_tid; >> >>+ jbd2_debug(2, "restarting handle %p\n", handle); >>+ >>+ /* Check if transaction is in invalid state */ >>+ if (transaction->t_state != T_RUNNING && >>+ transaction->t_state != T_LOCKED) { >>+ if (current->journal_info == handle) >>+ current->journal_info = NULL; >>+ handle->h_transaction = NULL; >>+ memalloc_nofs_restore(handle->saved_alloc_context); >>+ goto skip_stop; >>+ } >>+ >> /* >> * First unlink the handle from its current transaction, and start the >> * commit on that. >> */ >>- jbd2_debug(2, "restarting handle %p\n", handle); >> stop_this_handle(handle); >> handle->h_transaction = NULL; >>- >>+skip_stop: >> /* >> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we can >> * get rid of pointless j_state_lock traffic like this. >>@@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle) >> memalloc_nofs_restore(handle->saved_alloc_context); >> goto free_and_exit; >> } >>+ /* Check if transaction is in invalid state */ >>+ if (transaction->t_state != T_RUNNING && >>+ transaction->t_state != T_LOCKED) { >>+ if (current->journal_info == handle) >>+ current->journal_info = NULL; >>+ handle->h_transaction = NULL; >>+ memalloc_nofs_restore(handle->saved_alloc_context); >>+ goto free_and_exit; >>+ } >>+ >> journal = transaction->t_journal; >> tid = transaction->t_tid; >> >>-- >>2.39.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle 2026-03-05 13:11 ` liubaolin @ 2026-03-06 1:20 ` yebin (H) 2026-03-06 10:00 ` liubaolin 0 siblings, 1 reply; 6+ messages in thread From: yebin (H) @ 2026-03-06 1:20 UTC (permalink / raw) To: liubaolin, tytso, jack; +Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu On 2026/3/5 21:11, liubaolin wrote: >> Dear maintainers and community, >> >> Our customer reported a kernel crash issue. The kernel crashes in >> stop_this_handle() with: >> J_ASSERT(atomic_read(&transaction->t_updates) > 0); >> >> Crash stack: >> Call trace: >> stop_this_handle+0x148/0x158 >> jbd2_journal_stop+0x198/0x388 >> __ext4_journal_stop+0x70/0xf0 >> ext4_create+0x12c/0x188 >> ... >> >> From the vmcore dump, I found that handle->h_transaction->t_updates is >> 0 and handle->h_transaction->t_state is T_FINISHED. This means the >> handle is still bound to a transaction that has already completed. >> >> In the JBD2 commit process, once a transaction enters T_LOCKED state, >> the commit thread waits for all associated handles to complete >> (t_updates becomes 0). After T_LOCKED completes, the transaction >> transitions to T_FLUSH, T_COMMIT, and eventually T_FINISHED. At this >> point, t_updates is legitimately 0, and there should be no active >> handles bound to this transaction. >> >> This appears to be a low-probability race condition that occurs under >> high concurrency scenarios. The crash happened during ext4_create(), I'm curious about what race condition causes this? >> and at some rare timing point during the execution, a handle ended up >> bound to a transaction that had already completed. The symptom is >> clear: a handle is bound to a T_FINISHED transaction with t_updates == 0. >> >> To prevent this, I added a transaction state check in both >> jbd2_journal_stop() and jbd2__journal_restart() before calling >> stop_this_handle(). >> If the transaction is not in T_RUNNING or T_LOCKED state (i.e., it's >> in T_FLUSH or later states), I clear handle->h_transaction, clear >> current->journal_info if needed, restore the memalloc_nofs context, >> and skip calling stop_this_handle(). This is a defensive check to >> handle the edge case where a handle is bound to a transaction in an >> invalid state. >> If this state occurs, there must be something wrong somewhere. Assertions should also be added to detect such anomalies. I think directly skipping the assertion check is not a good way to solve the problem. >> Please let me know if you have any questions or concerns. >> >> Best regards, >> Baolin Liu > > > > 在 2026/3/5 20:57, Baolin Liu 写道: >> >> Add others >> >> >> >> >> >> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote: >>> From: Baolin Liu <liubaolin@kylinos.cn> >>> >>> When a transaction enters T_FLUSH or later states, >>> handle->h_transaction may still point to it. >>> If jbd2_journal_stop() or jbd2__journal_restart() is called, >>> stop_this_handle() checks t_updates > 0, but t_updates is >>> already 0 for these states, causing a kernel BUG. >>> >>> Fix by checking transaction->t_state in jbd2_journal_stop() >>> and jbd2__journal_restart() before calling stop_this_handle(). >>> If the transaction is not in T_RUNNING or T_LOCKED state, >>> clear handle->h_transaction and skip stop_this_handle(). >>> >>> Crash stack: >>> Call trace: >>> stop_this_handle+0x148/0x158 >>> jbd2_journal_stop+0x198/0x388 >>> __ext4_journal_stop+0x70/0xf0 >>> ext4_create+0x12c/0x188 >>> lookup_open+0x214/0x6d8 >>> do_last+0x364/0x878 >>> path_openat+0x6c/0x280 >>> do_filp_open+0x70/0xe8 >>> do_sys_open+0x178/0x200 >>> sys_openat+0x3c/0x50 >>> el0_svc_naked+0x44/0x48 >>> >>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >>> --- >>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++-- >>> 1 file changed, 23 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c >>> index dca4b5d8aaaa..3779382dbb80 100644 >>> --- a/fs/jbd2/transaction.c >>> +++ b/fs/jbd2/transaction.c >>> @@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int >>> nblocks, int revoke_records, >>> journal = transaction->t_journal; >>> tid = transaction->t_tid; >>> >>> + jbd2_debug(2, "restarting handle %p\n", handle); >>> + >>> + /* Check if transaction is in invalid state */ >>> + if (transaction->t_state != T_RUNNING && >>> + transaction->t_state != T_LOCKED) { >>> + if (current->journal_info == handle) >>> + current->journal_info = NULL; >>> + handle->h_transaction = NULL; >>> + memalloc_nofs_restore(handle->saved_alloc_context); >>> + goto skip_stop; >>> + } >>> + >>> /* >>> * First unlink the handle from its current transaction, and >>> start the >>> * commit on that. >>> */ >>> - jbd2_debug(2, "restarting handle %p\n", handle); >>> stop_this_handle(handle); >>> handle->h_transaction = NULL; >>> - >>> +skip_stop: >>> /* >>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we >>> can >>> * get rid of pointless j_state_lock traffic like this. >>> @@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle) >>> memalloc_nofs_restore(handle->saved_alloc_context); >>> goto free_and_exit; >>> } >>> + /* Check if transaction is in invalid state */ >>> + if (transaction->t_state != T_RUNNING && >>> + transaction->t_state != T_LOCKED) { >>> + if (current->journal_info == handle) >>> + current->journal_info = NULL; >>> + handle->h_transaction = NULL; >>> + memalloc_nofs_restore(handle->saved_alloc_context); >>> + goto free_and_exit; >>> + } >>> + >>> journal = transaction->t_journal; >>> tid = transaction->t_tid; >>> >>> -- >>> 2.39.2 > > > > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1] jbd2: check transaction state before stopping handle 2026-03-06 1:20 ` yebin (H) @ 2026-03-06 10:00 ` liubaolin 0 siblings, 0 replies; 6+ messages in thread From: liubaolin @ 2026-03-06 10:00 UTC (permalink / raw) To: yebin (H), tytso, jack; +Cc: linux-ext4, linux-kernel, wangguanyu, Baolin Liu > Dear yebin, > > Thank you for your feedback. > > From the crash, we can see this issue was triggered by the MemTableFlushTh process. > I agree with your point that if this state occurs, there must be something wrong somewhere. > Once we find and fix the root cause, we should add J_ASSERT((transaction->t_state == T_RUNNING) || (transaction->t_state == T_LOCKED)) to detect such anomalies, rather than directly skipping the assertion check. > > I will continue to investigate the root cause and try to find a way to reproduce this issue. > If you have any thoughts or suggestions on this problem, I'd be happy to discuss. > > Best regards, > Baolin Liu 在 2026/3/6 9:20, yebin (H) 写道: > > > On 2026/3/5 21:11, liubaolin wrote: >>> Dear maintainers and community, >>> >>> Our customer reported a kernel crash issue. The kernel crashes in >>> stop_this_handle() with: >>> J_ASSERT(atomic_read(&transaction->t_updates) > 0); >>> >>> Crash stack: >>> Call trace: >>> stop_this_handle+0x148/0x158 >>> jbd2_journal_stop+0x198/0x388 >>> __ext4_journal_stop+0x70/0xf0 >>> ext4_create+0x12c/0x188 >>> ... >>> >>> From the vmcore dump, I found that handle->h_transaction->t_updates is >>> 0 and handle->h_transaction->t_state is T_FINISHED. This means the >>> handle is still bound to a transaction that has already completed. >>> >>> In the JBD2 commit process, once a transaction enters T_LOCKED state, >>> the commit thread waits for all associated handles to complete >>> (t_updates becomes 0). After T_LOCKED completes, the transaction >>> transitions to T_FLUSH, T_COMMIT, and eventually T_FINISHED. At this >>> point, t_updates is legitimately 0, and there should be no active >>> handles bound to this transaction. >>> >>> This appears to be a low-probability race condition that occurs under >>> high concurrency scenarios. The crash happened during ext4_create(), > I'm curious about what race condition causes this? > >>> and at some rare timing point during the execution, a handle ended up >>> bound to a transaction that had already completed. The symptom is >>> clear: a handle is bound to a T_FINISHED transaction with t_updates >>> == 0. >>> >>> To prevent this, I added a transaction state check in both >>> jbd2_journal_stop() and jbd2__journal_restart() before calling >>> stop_this_handle(). >>> If the transaction is not in T_RUNNING or T_LOCKED state (i.e., it's >>> in T_FLUSH or later states), I clear handle->h_transaction, clear >>> current->journal_info if needed, restore the memalloc_nofs context, >>> and skip calling stop_this_handle(). This is a defensive check to >>> handle the edge case where a handle is bound to a transaction in an >>> invalid state. >>> > If this state occurs, there must be something wrong somewhere. > Assertions should also be added to detect such anomalies. I think > directly skipping the assertion check is not a good way to solve the > problem. >>> Please let me know if you have any questions or concerns. >>> >>> Best regards, >>> Baolin Liu >> >> >> >> 在 2026/3/5 20:57, Baolin Liu 写道: >>> >>> Add others >>> >>> >>> >>> >>> >>> At 2026-03-05 20:54:02, "Baolin Liu" <liubaolin12138@163.com> wrote: >>>> From: Baolin Liu <liubaolin@kylinos.cn> >>>> >>>> When a transaction enters T_FLUSH or later states, >>>> handle->h_transaction may still point to it. >>>> If jbd2_journal_stop() or jbd2__journal_restart() is called, >>>> stop_this_handle() checks t_updates > 0, but t_updates is >>>> already 0 for these states, causing a kernel BUG. >>>> >>>> Fix by checking transaction->t_state in jbd2_journal_stop() >>>> and jbd2__journal_restart() before calling stop_this_handle(). >>>> If the transaction is not in T_RUNNING or T_LOCKED state, >>>> clear handle->h_transaction and skip stop_this_handle(). >>>> >>>> Crash stack: >>>> Call trace: >>>> stop_this_handle+0x148/0x158 >>>> jbd2_journal_stop+0x198/0x388 >>>> __ext4_journal_stop+0x70/0xf0 >>>> ext4_create+0x12c/0x188 >>>> lookup_open+0x214/0x6d8 >>>> do_last+0x364/0x878 >>>> path_openat+0x6c/0x280 >>>> do_filp_open+0x70/0xe8 >>>> do_sys_open+0x178/0x200 >>>> sys_openat+0x3c/0x50 >>>> el0_svc_naked+0x44/0x48 >>>> >>>> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> >>>> --- >>>> fs/jbd2/transaction.c | 25 +++++++++++++++++++++++-- >>>> 1 file changed, 23 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c >>>> index dca4b5d8aaaa..3779382dbb80 100644 >>>> --- a/fs/jbd2/transaction.c >>>> +++ b/fs/jbd2/transaction.c >>>> @@ -772,14 +772,25 @@ int jbd2__journal_restart(handle_t *handle, int >>>> nblocks, int revoke_records, >>>> journal = transaction->t_journal; >>>> tid = transaction->t_tid; >>>> >>>> + jbd2_debug(2, "restarting handle %p\n", handle); >>>> + >>>> + /* Check if transaction is in invalid state */ >>>> + if (transaction->t_state != T_RUNNING && >>>> + transaction->t_state != T_LOCKED) { >>>> + if (current->journal_info == handle) >>>> + current->journal_info = NULL; >>>> + handle->h_transaction = NULL; >>>> + memalloc_nofs_restore(handle->saved_alloc_context); >>>> + goto skip_stop; >>>> + } >>>> + >>>> /* >>>> * First unlink the handle from its current transaction, and >>>> start the >>>> * commit on that. >>>> */ >>>> - jbd2_debug(2, "restarting handle %p\n", handle); >>>> stop_this_handle(handle); >>>> handle->h_transaction = NULL; >>>> - >>>> +skip_stop: >>>> /* >>>> * TODO: If we use READ_ONCE / WRITE_ONCE for j_commit_request we >>>> can >>>> * get rid of pointless j_state_lock traffic like this. >>>> @@ -1856,6 +1867,16 @@ int jbd2_journal_stop(handle_t *handle) >>>> memalloc_nofs_restore(handle->saved_alloc_context); >>>> goto free_and_exit; >>>> } >>>> + /* Check if transaction is in invalid state */ >>>> + if (transaction->t_state != T_RUNNING && >>>> + transaction->t_state != T_LOCKED) { >>>> + if (current->journal_info == handle) >>>> + current->journal_info = NULL; >>>> + handle->h_transaction = NULL; >>>> + memalloc_nofs_restore(handle->saved_alloc_context); >>>> + goto free_and_exit; >>>> + } >>>> + >>>> journal = transaction->t_journal; >>>> tid = transaction->t_tid; >>>> >>>> -- >>>> 2.39.2 >> >> >> >> . >> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-06 10:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-05 12:54 [PATCH v1] jbd2: check transaction state before stopping handle Baolin Liu
[not found] ` <257c6f1e.a166.19cbe12f387.Coremail.liubaolin12138@163.com>
2026-03-05 13:08 ` Jan Kara
2026-03-06 9:19 ` liubaolin
2026-03-05 13:11 ` liubaolin
2026-03-06 1:20 ` yebin (H)
2026-03-06 10:00 ` liubaolin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox