* Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c [not found] <BCB84D936723884B91E4CC5CA0A7C54AA4F6D082BE@EMEADALEXMB1.corp.nai.org> @ 2011-04-25 23:14 ` Ted Ts'o 2011-04-26 0:23 ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Ted Ts'o @ 2011-04-25 23:14 UTC (permalink / raw) To: Martin_Zielinski; +Cc: linux-ext4 On Thu, Apr 21, 2011 at 09:17:57AM -0500, Martin_Zielinski@McAfee.com wrote: > > I posted this BUG already on the ext3-users list without response. > After making some new observations I hope, that someone here can > tell me these make sense. Kernel output of the BUG is at the end of > the mail. Hi Martin, Thanks for your observations. I don't necessarily always follow mail sent to ext3-users, but fortunately I saw this note sent to the LKML list. > Here's some debug output that I put into the code: > kernel: (fs/ext3/fsync.c, 77): ext3_sync_file: ext3_sync_file datasync=1 d_tid=27807 tid=27846 > kernel: (fs/jbd/journal.c, 467): log_start_commit: log start commit called with commit request=27845, tid=27807 running transaction=ffff8800266913c0 27846 > > So the "really-commited" transaction id was advancing while this > datasync_tid stayed the same and journal.c - log_start_commit() was > called without waking the commit process. > > I wondered what happens if the current journal tid is overflowing > (32bit unsigned integer). By forcing the tid in get_transaction to > jump close to UINT_MAX, I could reproduce the BUG. A simple overflow shouldn't cause the problem, because of how tid_geq() is coded. However, if there have been 2**31 commits since the fdatasync file has been opened, it's possible to trigger this. That's a **lot** of commits, so I'm not sure I'm completely happy with this theory. Nevertheless, I believe this set of patches (one for ext4, and one for ext3), should prevent the crash from happening. - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] jbd2: fix fsync() tid wraparound bug 2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o @ 2011-04-26 0:23 ` Theodore Ts'o 2011-04-26 0:23 ` [PATCH 2/2] jbd: " Theodore Ts'o 2011-04-26 9:07 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski 2 siblings, 0 replies; 18+ messages in thread From: Theodore Ts'o @ 2011-04-26 0:23 UTC (permalink / raw) To: Ext4 Developers List; +Cc: Martin_Zielinski, Theodore Ts'o If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd2/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test --- fs/jbd2/journal.c | 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index e0ec3db..a5daa5a 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -479,9 +479,12 @@ int __jbd2_log_space_left(journal_t *journal) int __jbd2_log_start_commit(journal_t *journal, tid_t target) { /* - * Are we already doing a recent enough commit? + * The only transaction we can possibly wait upon is the + * currently running transaction (if it exists). Otherwise, + * the target tid must be an old one. */ - if (!tid_geq(journal->j_commit_request, target)) { + if (journal->j_running_transaction && + journal->j_running_transaction->t_tid == target) { /* * We want a new commit: OK, mark the request and wakeup the * commit thread. We do _not_ do the commit ourselves. @@ -493,6 +496,17 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t target) journal->j_commit_sequence); wake_up(&journal->j_wait_commit); return 1; + } else if (!tid_geq(journal->j_commit_request, target)) { + /* This should never happen, but if it does, preserve + the evidence before kjournald goes into a loop and + increments j_commit_sequence beyond all recognition. */ + pr_err("jbd2: bad log_start_commit: %u %u %u\n", + journal->j_commit_request, journal->j_commit_sequence, + target); + if (journal->j_running_transaction) + pr_err("jbd2: current txn: %u\n", + journal->j_running_transaction->t_tid); + WARN_ON(1); } return 0; } -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o 2011-04-26 0:23 ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o @ 2011-04-26 0:23 ` Theodore Ts'o 2011-04-30 17:17 ` Ted Ts'o 2011-04-26 9:07 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski 2 siblings, 1 reply; 18+ messages in thread From: Theodore Ts'o @ 2011-04-26 0:23 UTC (permalink / raw) To: Ext4 Developers List; +Cc: Martin_Zielinski, Theodore Ts'o If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test --- fs/jbd/journal.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c index b3713af..6b51233 100644 --- a/fs/jbd/journal.c +++ b/fs/jbd/journal.c @@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal) int __log_start_commit(journal_t *journal, tid_t target) { /* - * Are we already doing a recent enough commit? + * The only transaction we can possibly wait upon is the + * currently running transaction (if it exists). Otherwise, + * the target tid must be an old one. */ - if (!tid_geq(journal->j_commit_request, target)) { + if (journal->j_running_transaction && + journal->j_running_transaction->t_tid == target) { /* * We want a new commit: OK, mark the request and wakeup the * commit thread. We do _not_ do the commit ourselves. -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-04-26 0:23 ` [PATCH 2/2] jbd: " Theodore Ts'o @ 2011-04-30 17:17 ` Ted Ts'o 2011-05-02 15:07 ` Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Ted Ts'o @ 2011-04-30 17:17 UTC (permalink / raw) To: Jan Kara; +Cc: Ext4 Developers List, Martin_Zielinski Hi Jan, I don't know if you've been following this thread, but I was wondering if you could review this patch, (a) for inclusion in the ext3 tree, and (b) because I'd appreciate a second pair of eyes looking at this patch, since I intend to push similar change to jbd2. I'm not entirely convinced this is caused by tid's wrapping around, since that would be a huge number of commits, but if it's not that, somehow i_datasync_tid or i_sync_tid is either getting corrupted or not getting set --- and I have no idea how that could be happening. This patch should at least avoid the system from crashing when we hit the case, and harmlessly handle the situation --- with at the worst case, an journal commit that wouldn't otherwise be needed. As background, I've been on this bug for months now, as it's been reported to me as occasionally happening on Android devices that have been using ext4. Since I hadn't seen any reports of this in the field in the x86 world, and this code hadn't changed in a long, long time, I had originally assumed it was an ARM-specific bug. However, recently, Martin Zielinski (on this thread) has reported this problem on an x86 system --- and on a x86 system to boot. Martin suspects it may have to do with sqllite --- which is consistent with I've seen, since I believe Android devices use sqllite quite heavily as well. - Ted jbd: fix fsync() tid wraparound bug If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test, since there's only one valid transaction id that we is valid for us to wait until it is commited: namely, the currently running transaction (if it exists). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- fs/jbd/journal.c | 16 +++++++++++++--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c index b3713af..1b71ce6 100644 --- a/fs/jbd/journal.c +++ b/fs/jbd/journal.c @@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal) int __log_start_commit(journal_t *journal, tid_t target) { /* - * Are we already doing a recent enough commit? + * The only transaction we can possibly wait upon is the + * currently running transaction (if it exists). Otherwise, + * the target tid must be an old one. */ - if (!tid_geq(journal->j_commit_request, target)) { + if (journal->j_running_transaction && + journal->j_running_transaction->t_tid == target) { /* * We want a new commit: OK, mark the request and wakeup the * commit thread. We do _not_ do the commit ourselves. @@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target) journal->j_commit_sequence); wake_up(&journal->j_wait_commit); return 1; - } + } else if (!tid_geq(journal->j_commit_request, target)) + /* This should never happen, but if it does, preserve + the evidence before kjournald goes into a loop and + increments j_commit_sequence beyond all recognition. */ + WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n", + journal->j_commit_request, journal->j_commit_sequence, + target, journal->j_running_transaction ? + journal->j_running_transaction->t_tid : 0); return 0; } ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-04-30 17:17 ` Ted Ts'o @ 2011-05-02 15:07 ` Jan Kara 2011-05-02 18:29 ` Ted Ts'o 0 siblings, 1 reply; 18+ messages in thread From: Jan Kara @ 2011-05-02 15:07 UTC (permalink / raw) To: Ted Ts'o; +Cc: Jan Kara, Ext4 Developers List, Martin_Zielinski Hi Ted, On Sat 30-04-11 13:17:11, Ted Tso wrote: > I don't know if you've been following this thread, but I was wondering > if you could review this patch, (a) for inclusion in the ext3 tree, > and (b) because I'd appreciate a second pair of eyes looking at this > patch, since I intend to push similar change to jbd2. Thanks for forwarding. For some reason I got unsubscribed from linux-ext4 a while ago and didn't notice this since linux-fsdevel goes into the same mailbox. > I'm not entirely convinced this is caused by tid's wrapping around, > since that would be a huge number of commits, but if it's not that, > somehow i_datasync_tid or i_sync_tid is either getting corrupted or > not getting set --- and I have no idea how that could be happening. > This patch should at least avoid the system from crashing when we hit > the case, and harmlessly handle the situation --- with at the worst > case, an journal commit that wouldn't otherwise be needed. The patch looks OK in any case. I'll take it in my tree. It would take about 24 days of constant 1000 trans/s load to trigger this. That's a quite heavy load but not so unrealistic with today's HW. > As background, I've been on this bug for months now, as it's been > reported to me as occasionally happening on Android devices that have > been using ext4. Since I hadn't seen any reports of this in the field > in the x86 world, and this code hadn't changed in a long, long time, I > had originally assumed it was an ARM-specific bug. However, recently, > Martin Zielinski (on this thread) has reported this problem on an x86 > system --- and on a x86 system to boot. > > Martin suspects it may have to do with sqllite --- which is consistent > with I've seen, since I believe Android devices use sqllite quite > heavily as well. Yeah, it may be. Honza > jbd: fix fsync() tid wraparound bug > > If an application program does not make any changes to the indirect > blocks or extent tree, i_datasync_tid will not get updated. If there > are enough commits (i.e., 2**31) such that tid_geq()'s calculations > wrap, and there isn't a currently active transaction at the time of > the fdatasync() call, this can end up triggering a BUG_ON in > fs/jbd/commit.c: > > J_ASSERT(journal->j_running_transaction != NULL); > > It's pretty rare that this can happen, since it requires the use of > fdatasync() plus *very* frequent and excessive use of fsync(). But > with the right workload, it can. > > We fix this by replacing the use of tid_geq() with an equality test, > since there's only one valid transaction id that we is valid for us to > wait until it is commited: namely, the currently running transaction > (if it exists). > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/jbd/journal.c | 16 +++++++++++++--- > 1 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c > index b3713af..1b71ce6 100644 > --- a/fs/jbd/journal.c > +++ b/fs/jbd/journal.c > @@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal) > int __log_start_commit(journal_t *journal, tid_t target) > { > /* > - * Are we already doing a recent enough commit? > + * The only transaction we can possibly wait upon is the > + * currently running transaction (if it exists). Otherwise, > + * the target tid must be an old one. > */ > - if (!tid_geq(journal->j_commit_request, target)) { > + if (journal->j_running_transaction && > + journal->j_running_transaction->t_tid == target) { > /* > * We want a new commit: OK, mark the request and wakeup the > * commit thread. We do _not_ do the commit ourselves. > @@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target) > journal->j_commit_sequence); > wake_up(&journal->j_wait_commit); > return 1; > - } > + } else if (!tid_geq(journal->j_commit_request, target)) > + /* This should never happen, but if it does, preserve > + the evidence before kjournald goes into a loop and > + increments j_commit_sequence beyond all recognition. */ > + WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n", > + journal->j_commit_request, journal->j_commit_sequence, > + target, journal->j_running_transaction ? > + journal->j_running_transaction->t_tid : 0); > return 0; > } > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-02 15:07 ` Jan Kara @ 2011-05-02 18:29 ` Ted Ts'o 2011-05-02 19:04 ` Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Ted Ts'o @ 2011-05-02 18:29 UTC (permalink / raw) To: Jan Kara; +Cc: Ext4 Developers List, Martin_Zielinski On Mon, May 02, 2011 at 05:07:58PM +0200, Jan Kara wrote: > The patch looks OK in any case. I'll take it in my tree. Great, thanks. > It would take > about 24 days of constant 1000 trans/s load to trigger this. That's a quite > heavy load but not so unrealistic with today's HW. True; but one of the reason why I'm not sure I believe that is this bug is showing up on some Android devices, where (a) 1000 trans/second *is* a not just a huge load, but almost impossible to believe, and (b) if we really are doing 2**31 commits, we would be wearing out the flash storage being used on the Android devices! (And we're seeing a handful of failures every week from the testers who participate in a kerneloops-like reporting system.) So while I would very much like to believe that it's caused by a tid wrap, I'm worried there is another bug hiding here.... - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-02 18:29 ` Ted Ts'o @ 2011-05-02 19:04 ` Jan Kara 2011-05-02 21:31 ` Ted Ts'o 0 siblings, 1 reply; 18+ messages in thread From: Jan Kara @ 2011-05-02 19:04 UTC (permalink / raw) To: Ted Ts'o; +Cc: Jan Kara, Ext4 Developers List, Martin_Zielinski On Mon 02-05-11 14:29:45, Ted Tso wrote: > On Mon, May 02, 2011 at 05:07:58PM +0200, Jan Kara wrote: > > The patch looks OK in any case. I'll take it in my tree. > Great, thanks. > > > It would take > > about 24 days of constant 1000 trans/s load to trigger this. That's a quite > > heavy load but not so unrealistic with today's HW. > > True; but one of the reason why I'm not sure I believe that is this > bug is showing up on some Android devices, where (a) 1000 trans/second > *is* a not just a huge load, but almost impossible to believe, and (b) > if we really are doing 2**31 commits, we would be wearing out the > flash storage being used on the Android devices! (And we're seeing a > handful of failures every week from the testers who participate in a > kerneloops-like reporting system.) Ah, OK. I didn't know it's so many reports. Indeed it looks like there's another bug. But then the WARN you have added should tell us more about who's causing the problem, right? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-02 19:04 ` Jan Kara @ 2011-05-02 21:31 ` Ted Ts'o 2011-05-04 14:21 ` Martin_Zielinski 0 siblings, 1 reply; 18+ messages in thread From: Ted Ts'o @ 2011-05-02 21:31 UTC (permalink / raw) To: Jan Kara; +Cc: Ext4 Developers List, Martin_Zielinski On Mon, May 02, 2011 at 09:04:41PM +0200, Jan Kara wrote: > Ah, OK. I didn't know it's so many reports. Indeed it looks like there's > another bug. But then the WARN you have added should tell us more about > who's causing the problem, right? That's the hope. I'm not sure WARNs get logged in Android's reporting system. I need to work the Android folks to figure this out. Worst case I may have to turn the WARN into a BUG for them (although it may be tough to convince them to make such a change). I'm hoping though that the debugging information from Martin's machines will also come through with some useful information that will allow us to root cause whatever is causing this to happen. - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-02 21:31 ` Ted Ts'o @ 2011-05-04 14:21 ` Martin_Zielinski 2011-05-04 21:55 ` Jan Kara 0 siblings, 1 reply; 18+ messages in thread From: Martin_Zielinski @ 2011-05-04 14:21 UTC (permalink / raw) To: tytso, jack; +Cc: linux-ext4 Here's an update. In my first post I was not aware of the implementation of tid_gt. I agree that 2 and a half billion commits on an SD card are - hmph - unlikely The kernels with the patch is not yet installed because we recently provided a kernel with a panic() instead of J_ASSERT and a crashkernel. We now have 5 cores all showing the same behavior in j_commit_sequence vs. j_commit_request. What the cores show (snipped): (gdb) bt #0 crash_setup_regs (regs=0x0) at /usr/src/debug/linux-2.6.32/arch/x86/include/asm/kexec.h:127 #1 crash_kexec (regs=0x0) at kernel/kexec.c:1076 #2 0xffffffff81048872 in panic (fmt=0xffffffff818bdb95 "running transaction is NULL") at kernel/panic.c:82 #3 0xffffffff81159b79 in journal_commit_transaction (journal=0xffff88031e6c4600) at fs/jbd/commit.c:347 #4 0xffffffff8115de49 in kjournald (arg=<value optimized out>) at fs/jbd/journal.c:150 #5 0xffffffff8105fccf in kthread (_create=<value optimized out>) at kernel/kthread.c:78 #6 0xffffffff8100c9da in ?? () #7 0x0000000000000000 in ?? () gdb) p *journal $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = { slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = { raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40, ... j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, j_commit_sequence = 2288014067, j_commit_request = 140530417, ... j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, j_average_commit_time = 69247, j_private = 0xffff88031fd49400} Means that j_commit_request would be considered to be greater than j_commit_sequence and trigger a commit. The j_last_sync_writer has a promising backtrace: crash> bt 4568 PID: 4568 TASK: ffff8802c38ce300 CPU: 8 COMMAND: "core" #0 [ffff8802c38d1da8] schedule at ffffffff8159f5e8 #1 [ffff8802c38d1e50] log_wait_commit at ffffffff8115d4a1 #2 [ffff8802c38d1ec0] ext3_sync_file at ffffffff8113c589 #3 [ffff8802c38d1ef0] vfs_fsync_range at ffffffff81105588 #4 [ffff8802c38d1f30] vfs_fsync at ffffffff81105612 #5 [ffff8802c38d1f40] do_fsync at ffffffff81105646 #6 [ffff8802c38d1f70] sys_fdatasync at ffffffff8110566a #7 [ffff8802c38d1f80] system_call_fastpath at ffffffff8100ba2b RIP: 00007f5be9bd6587 RSP: 0000000047386f38 RFLAGS: 00010202 RAX: 000000000000004b RBX: ffffffff8100ba2b RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000f7 RBP: 00007f5aecf23cc8 R8: 00007f5bc54e56a0 R9: 00000000000011d8 R10: 0000000001c01800 R11: 0000000000000202 R12: ffffffff8110566a R13: ffff8802c38d1f78 R14: 00007f5aecf23b98 R15: 0000000000000000 ORIG_RAX: 000000000000004b CS: 0033 SS: 002b To make a long story short: The inode did not carry the j_commit_request tid. But: crash> foreach files ... 247 ffff88031e65bb00 ffff88026c3ea540 ffff88031f0c0758 REG /opt/data/users/user.db ... (this is an sqlite database) And: (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid $5 = {counter = -2006954411} (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid $3 = {counter = 140530417} > j_commit_request = 140530417 So it *is* a datasync from sqlite. And your fix will catch it. I still don't understand, where this number comes from. Unfortunately I cannot provide the core files as they are coming from customers. But if there is anything I can do to provide further information, please let me know. Cheers, Martin -----Original Message----- From: Ted Ts'o [mailto:tytso@mit.edu] Sent: Montag, 2. Mai 2011 23:31 To: Jan Kara Cc: Ext4 Developers List; Zielinski, Martin Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug On Mon, May 02, 2011 at 09:04:41PM +0200, Jan Kara wrote: > Ah, OK. I didn't know it's so many reports. Indeed it looks like there's > another bug. But then the WARN you have added should tell us more about > who's causing the problem, right? That's the hope. I'm not sure WARNs get logged in Android's reporting system. I need to work the Android folks to figure this out. Worst case I may have to turn the WARN into a BUG for them (although it may be tough to convince them to make such a change). I'm hoping though that the debugging information from Martin's machines will also come through with some useful information that will allow us to root cause whatever is causing this to happen. - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-04 14:21 ` Martin_Zielinski @ 2011-05-04 21:55 ` Jan Kara 2011-05-05 14:11 ` Martin_Zielinski 2011-05-05 14:55 ` Martin_Zielinski 0 siblings, 2 replies; 18+ messages in thread From: Jan Kara @ 2011-05-04 21:55 UTC (permalink / raw) To: Martin_Zielinski; +Cc: tytso, jack, linux-ext4 On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote: > Here's an update. > In my first post I was not aware of the implementation of tid_gt. > I agree that 2 and a half billion commits on an SD card are - hmph - > unlikely <snip> > gdb) p *journal > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, > j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = { > slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = { > raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, > prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, > j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40, > ... > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, > j_commit_sequence = 2288014067, j_commit_request = 140530417, > ... > j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, > j_average_commit_time = 69247, j_private = 0xffff88031fd49400} <snip> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid > $5 = {counter = -2006954411} > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid > $3 = {counter = 140530417} > > > j_commit_request = 140530417 > > So it *is* a datasync from sqlite. And your fix will catch it. > I still don't understand, where this number comes from. Ok, so i_datasync_tid got corrupted. But look at the numbers in hex: i_datasync_tid==140530417==0x86052F1 and i_commit_sequence==2288014067==0x886052F3 So it's a single bit error - we lost the highest bit of the number. Are you getting the cores from different machines? Otherwise I'd suspect the HW. If it's not HW I'm at loss what can cause it... You can try moving i_datasync_tid to a different place in struct ext3_inode_info so that we can rule out / confirm whether some code external to i_datasync_tid handling is just causing random memory corruption... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-04 21:55 ` Jan Kara @ 2011-05-05 14:11 ` Martin_Zielinski 2011-05-05 15:53 ` Jan Kara 2011-05-05 14:55 ` Martin_Zielinski 1 sibling, 1 reply; 18+ messages in thread From: Martin_Zielinski @ 2011-05-05 14:11 UTC (permalink / raw) To: jack; +Cc: tytso, linux-ext4 It is not a hardware bug and very unlikely a race condition or random memory corruption. We have 7 machines that failed with an uptime of 12-13 days. 12 days earlier about 5 different machines failed after the same uptime. All machines were rebooted after this issue. In 5 out of 7 cores the commit request number come from the same sqlite database. In 2 cores I could not find an inode or file structure pointing to this database and hence it could not be verified that the request number comes from this database inode. I'm not so sure about the bit error. Some have the hi-bit set, some not. Due to the implementation of tid_gt() I would expect that the numbers differ in the high bit. The condition produces a wrong result, if the difference between the numbers is greater than INT_MAX. The sequence / request numbers / difference INT_MAX + x: 886052f3 / 086052f1 / 3 61d305fe / e1ce83f4 / 295434 887d10c8 / 087acf05 / 147908 8e3d0b25 / 0e374365 / 378817 702d4061 / f02a5e0b / 189014 73d6775a / f3d67756 / 4 824846ad / 024846ab / 2 On some of the machines we traced the commit_sequence for a short time. However no indication could be found the logs, that a datasync was triggered with always the same tid. Result example: Uptime 5 days, 33 min : commit_sequence: 1135688364 Uptime 12 days, 2:40 : commit_sequence: 1960701710 825013346 commits in 170 hours 4853019 commits per hour => 442 hours or 18 days to have INT_MAX commits. Cheers, Martin -----Original Message----- From: Jan Kara [mailto:jack@suse.cz] Sent: Mittwoch, 4. Mai 2011 23:55 To: Zielinski, Martin Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote: > Here's an update. > In my first post I was not aware of the implementation of tid_gt. > I agree that 2 and a half billion commits on an SD card are - hmph - > unlikely <snip> > gdb) p *journal > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, > j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = { > slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = { > raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, > prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, > j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40, > ... > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, > j_commit_sequence = 2288014067, j_commit_request = 140530417, > ... > j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, > j_average_commit_time = 69247, j_private = 0xffff88031fd49400} <snip> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid > $5 = {counter = -2006954411} > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid > $3 = {counter = 140530417} > > > j_commit_request = 140530417 > > So it *is* a datasync from sqlite. And your fix will catch it. > I still don't understand, where this number comes from. Ok, so i_datasync_tid got corrupted. But look at the numbers in hex: i_datasync_tid==140530417==0x86052F1 and i_commit_sequence==2288014067==0x886052F3 So it's a single bit error - we lost the highest bit of the number. Are you getting the cores from different machines? Otherwise I'd suspect the HW. If it's not HW I'm at loss what can cause it... You can try moving i_datasync_tid to a different place in struct ext3_inode_info so that we can rule out / confirm whether some code external to i_datasync_tid handling is just causing random memory corruption... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-05 14:11 ` Martin_Zielinski @ 2011-05-05 15:53 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2011-05-05 15:53 UTC (permalink / raw) To: Martin_Zielinski; +Cc: jack, tytso, linux-ext4 On Thu 05-05-11 09:11:22, Martin_Zielinski@McAfee.com wrote: > It is not a hardware bug and very unlikely a race condition or random > memory corruption. We have 7 machines that failed with an uptime of > 12-13 days. 12 days earlier about 5 different machines failed after the > same uptime. All machines were rebooted after this issue. > > In 5 out of 7 cores the commit request number come from the same sqlite > database. In 2 cores I could not find an inode or file structure > pointing to this database and hence it could not be verified that the > request number comes from this database inode. > > I'm not so sure about the bit error. Some have the hi-bit set, some not. > Due to the implementation of tid_gt() I would expect that the numbers > differ in the high bit. The condition produces a wrong result, if the > difference between the numbers is greater than INT_MAX. Yes, sorry. I was confused yesterday. The numbers are exactly such as they should be when an application constantly calls fdatasync() without modifying file metadata. > The sequence / request numbers / difference INT_MAX + x: > 886052f3 / 086052f1 / 3 > 61d305fe / e1ce83f4 / 295434 > 887d10c8 / 087acf05 / 147908 > 8e3d0b25 / 0e374365 / 378817 > 702d4061 / f02a5e0b / 189014 > 73d6775a / f3d67756 / 4 > 824846ad / 024846ab / 2 > V On some of the machines we traced the commit_sequence for a short time. > However no indication could be found the logs, that a datasync was > triggered with always the same tid. > > Result example: > > Uptime 5 days, 33 min : commit_sequence: 1135688364 > Uptime 12 days, 2:40 : commit_sequence: 1960701710 > > 825013346 commits in 170 hours > > 4853019 commits per hour => 442 hours or 18 days to have INT_MAX commits. Yes. In your case it seems we really wrap-around tid. I'll push the fix to Linus in the next merge window and then it will go also to -stable kernels. Thanks for your testing! Honza > -----Original Message----- > From: Jan Kara [mailto:jack@suse.cz] > Sent: Mittwoch, 4. Mai 2011 23:55 > To: Zielinski, Martin > Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org > Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug > > On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote: > > Here's an update. > > In my first post I was not aware of the implementation of tid_gt. > > I agree that 2 and a half billion commits on an SD card are - hmph - > > unlikely > <snip> > > > gdb) p *journal > > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, > > j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = { > > slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = { > > raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, > > prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, > > j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40, > > ... > > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, > > j_commit_sequence = 2288014067, j_commit_request = 140530417, > > ... > > j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, > > j_average_commit_time = 69247, j_private = 0xffff88031fd49400} > <snip> > > > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid > > $5 = {counter = -2006954411} > > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid > > $3 = {counter = 140530417} > > > > > j_commit_request = 140530417 > > > > So it *is* a datasync from sqlite. And your fix will catch it. > > I still don't understand, where this number comes from. > Ok, so i_datasync_tid got corrupted. But look at the numbers in hex: > i_datasync_tid==140530417==0x86052F1 > and > i_commit_sequence==2288014067==0x886052F3 > > So it's a single bit error - we lost the highest bit of the number. Are you > getting the cores from different machines? Otherwise I'd suspect the HW. > If it's not HW I'm at loss what can cause it... You can try moving > i_datasync_tid to a different place in struct ext3_inode_info so that we > can rule out / confirm whether some code external to i_datasync_tid > handling is just causing random memory corruption... > > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-04 21:55 ` Jan Kara 2011-05-05 14:11 ` Martin_Zielinski @ 2011-05-05 14:55 ` Martin_Zielinski 2011-05-05 15:43 ` Jan Kara 1 sibling, 1 reply; 18+ messages in thread From: Martin_Zielinski @ 2011-05-05 14:55 UTC (permalink / raw) To: jack; +Cc: tytso, linux-ext4 Hello once more. I have one concern against the patch: If the situation is triggered again and again, the patch would produce lots of output. Maybe it's better to use WARN_ONCE. Cheers, Martin -----Original Message----- From: Jan Kara [mailto:jack@suse.cz] Sent: Mittwoch, 4. Mai 2011 23:55 To: Zielinski, Martin Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote: > Here's an update. > In my first post I was not aware of the implementation of tid_gt. > I agree that 2 and a half billion commits on an SD card are - hmph - > unlikely <snip> > gdb) p *journal > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, > j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = { > slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = { > raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, > prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, > j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40, > ... > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, > j_commit_sequence = 2288014067, j_commit_request = 140530417, > ... > j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, > j_average_commit_time = 69247, j_private = 0xffff88031fd49400} <snip> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid > $5 = {counter = -2006954411} > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid > $3 = {counter = 140530417} > > > j_commit_request = 140530417 > > So it *is* a datasync from sqlite. And your fix will catch it. > I still don't understand, where this number comes from. Ok, so i_datasync_tid got corrupted. But look at the numbers in hex: i_datasync_tid==140530417==0x86052F1 and i_commit_sequence==2288014067==0x886052F3 So it's a single bit error - we lost the highest bit of the number. Are you getting the cores from different machines? Otherwise I'd suspect the HW. If it's not HW I'm at loss what can cause it... You can try moving i_datasync_tid to a different place in struct ext3_inode_info so that we can rule out / confirm whether some code external to i_datasync_tid handling is just causing random memory corruption... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug 2011-05-05 14:55 ` Martin_Zielinski @ 2011-05-05 15:43 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2011-05-05 15:43 UTC (permalink / raw) To: Martin_Zielinski; +Cc: jack, tytso, linux-ext4 On Thu 05-05-11 09:55:22, Martin_Zielinski@McAfee.com wrote: > Hello once more. > I have one concern against the patch: > If the situation is triggered again and again, the patch would produce lots of output. > Maybe it's better to use WARN_ONCE. Yes, probably it will. Changed to WARN_ONCE in the JBD patch. Honza > -----Original Message----- > From: Jan Kara [mailto:jack@suse.cz] > Sent: Mittwoch, 4. Mai 2011 23:55 > To: Zielinski, Martin > Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org > Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug > > On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote: > > Here's an update. > > In my first post I was not aware of the implementation of tid_gt. > > I agree that 2 and a half billion commits on an SD card are - hmph - > > unlikely > <snip> > > > gdb) p *journal > > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, > > j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = { > > slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = { > > raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, > > prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, > > j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40, > > ... > > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, > > j_commit_sequence = 2288014067, j_commit_request = 140530417, > > ... > > j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, > > j_average_commit_time = 69247, j_private = 0xffff88031fd49400} > <snip> > > > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid > > $5 = {counter = -2006954411} > > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid > > $3 = {counter = 140530417} > > > > > j_commit_request = 140530417 > > > > So it *is* a datasync from sqlite. And your fix will catch it. > > I still don't understand, where this number comes from. > Ok, so i_datasync_tid got corrupted. But look at the numbers in hex: > i_datasync_tid==140530417==0x86052F1 > and > i_commit_sequence==2288014067==0x886052F3 > > So it's a single bit error - we lost the highest bit of the number. Are you > getting the cores from different machines? Otherwise I'd suspect the HW. > If it's not HW I'm at loss what can cause it... You can try moving > i_datasync_tid to a different place in struct ext3_inode_info so that we > can rule out / confirm whether some code external to i_datasync_tid > handling is just causing random memory corruption... > > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c 2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o 2011-04-26 0:23 ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o 2011-04-26 0:23 ` [PATCH 2/2] jbd: " Theodore Ts'o @ 2011-04-26 9:07 ` Martin_Zielinski 2011-04-26 12:23 ` Ted Ts'o 2 siblings, 1 reply; 18+ messages in thread From: Martin_Zielinski @ 2011-04-26 9:07 UTC (permalink / raw) To: tytso; +Cc: linux-ext4 Ted! Thank you a lot! We observed this bug on ~10 out of 40 machines after an uptime from about 3 weeks. All run under comparable conditions. I will have a closer look on the debugfs output to verify if the situation can happen within this short time range. Additionally we installed a crash kernel and I changed the BUG into a panic(). So I will be able to look at the journal structure if this happens again. My testing system, which runs under small load, would need several years to reach this point. But after two weeks of staring on journaling and filesystem code (that I never saw before), this is the only explanation I could find. If I can verify that this is the root cause (or not the root cause), I will post this with information about the user-land part that is responsible. Cheers, Martin -----Original Message----- From: Ted Ts'o [mailto:tytso@mit.edu] Sent: Dienstag, 26. April 2011 01:15 To: Zielinski, Martin Cc: linux-ext4@vger.kernel.org Subject: Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c On Thu, Apr 21, 2011 at 09:17:57AM -0500, Martin_Zielinski@McAfee.com wrote: > > I posted this BUG already on the ext3-users list without response. > After making some new observations I hope, that someone here can > tell me these make sense. Kernel output of the BUG is at the end of > the mail. Hi Martin, Thanks for your observations. I don't necessarily always follow mail sent to ext3-users, but fortunately I saw this note sent to the LKML list. > Here's some debug output that I put into the code: > kernel: (fs/ext3/fsync.c, 77): ext3_sync_file: ext3_sync_file datasync=1 d_tid=27807 tid=27846 > kernel: (fs/jbd/journal.c, 467): log_start_commit: log start commit called with commit request=27845, tid=27807 running transaction=ffff8800266913c0 27846 > > So the "really-commited" transaction id was advancing while this > datasync_tid stayed the same and journal.c - log_start_commit() was > called without waking the commit process. > > I wondered what happens if the current journal tid is overflowing > (32bit unsigned integer). By forcing the tid in get_transaction to > jump close to UINT_MAX, I could reproduce the BUG. A simple overflow shouldn't cause the problem, because of how tid_geq() is coded. However, if there have been 2**31 commits since the fdatasync file has been opened, it's possible to trigger this. That's a **lot** of commits, so I'm not sure I'm completely happy with this theory. Nevertheless, I believe this set of patches (one for ext4, and one for ext3), should prevent the crash from happening. - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c 2011-04-26 9:07 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski @ 2011-04-26 12:23 ` Ted Ts'o 2011-04-26 12:45 ` Martin_Zielinski 0 siblings, 1 reply; 18+ messages in thread From: Ted Ts'o @ 2011-04-26 12:23 UTC (permalink / raw) To: Martin_Zielinski; +Cc: linux-ext4 On Tue, Apr 26, 2011 at 04:07:11AM -0500, Martin_Zielinski@mcafee.com wrote: > Ted! > Thank you a lot! > We observed this bug on ~10 out of 40 machines after an uptime from about 3 weeks. All run under comparable conditions. > I will have a closer look on the debugfs output to verify if the situation can happen within this short time range. Additionally we installed a crash kernel and I changed the BUG into a panic(). > So I will be able to look at the journal structure if this happens again. If you would be willing to install the debugging code that is in the jbd2 patch into the jbd patch, and this put this on your production machines, that would be really great. I can send you a revised jbd patch if that would help (the debugging in code in jbd2 should move over to the jbd patch really simply). - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c 2011-04-26 12:23 ` Ted Ts'o @ 2011-04-26 12:45 ` Martin_Zielinski 2011-04-26 17:20 ` Ted Ts'o 0 siblings, 1 reply; 18+ messages in thread From: Martin_Zielinski @ 2011-04-26 12:45 UTC (permalink / raw) To: tytso; +Cc: linux-ext4 I will port the jbd2 debugging code to jbd an will try to get the new kernel into production. After a reboot we will have to wait several weeks. (Strange: all machines failed within 72 hours). With sqlite I can currently produce ~10.000.000 commits in one hour with a program that does nothing else. I doubt that it is possible to have an overflow in such a short time that we are observing. Maybe the __log_start_commit commit call comes with a corrupt target id from elsewhere. But your patch will catch that, too. Cheers, Martin -----Original Message----- From: Ted Ts'o [mailto:tytso@mit.edu] Sent: Dienstag, 26. April 2011 14:24 To: Zielinski, Martin Cc: linux-ext4@vger.kernel.org Subject: Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c On Tue, Apr 26, 2011 at 04:07:11AM -0500, Martin_Zielinski@mcafee.com wrote: > Ted! > Thank you a lot! > We observed this bug on ~10 out of 40 machines after an uptime from about 3 weeks. All run under comparable conditions. > I will have a closer look on the debugfs output to verify if the situation can happen within this short time range. Additionally we installed a crash kernel and I changed the BUG into a panic(). > So I will be able to look at the journal structure if this happens again. If you would be willing to install the debugging code that is in the jbd2 patch into the jbd patch, and this put this on your production machines, that would be really great. I can send you a revised jbd patch if that would help (the debugging in code in jbd2 should move over to the jbd patch really simply). - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c 2011-04-26 12:45 ` Martin_Zielinski @ 2011-04-26 17:20 ` Ted Ts'o 0 siblings, 0 replies; 18+ messages in thread From: Ted Ts'o @ 2011-04-26 17:20 UTC (permalink / raw) To: Martin_Zielinski; +Cc: linux-ext4 On Tue, Apr 26, 2011 at 07:45:33AM -0500, Martin_Zielinski@mcafee.com wrote: > I will port the jbd2 debugging code to jbd an will try to get the > new kernel into production. After a reboot we will have to wait > several weeks. (Strange: all machines failed within 72 hours). Great, thanks. > With sqlite I can currently produce ~10.000.000 commits in one hour > with a program that does nothing else. I doubt that it is possible > to have an overflow in such a short time that we are observing. > Maybe the __log_start_commit commit call comes with a corrupt target > id from elsewhere. But your patch will catch that, too. Agreed; that's why I don't really believe the wraparound theory. For your convenience, this is the revised (cleaned up) patch for the ext3/jbd (it just cleans up how we print the warning). - Ted commit 4ea00445c7f5d3dfa6219262598a2a8319df07c7 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Apr 26 13:14:55 2011 -0400 jbd: fix fsync() tid wraparound bug If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test, since there's only one valid transaction id that we is valid for us to wait until it is commited: namely, the currently running transaction (if it exists). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c index b3713af..1b71ce6 100644 --- a/fs/jbd/journal.c +++ b/fs/jbd/journal.c @@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal) int __log_start_commit(journal_t *journal, tid_t target) { /* - * Are we already doing a recent enough commit? + * The only transaction we can possibly wait upon is the + * currently running transaction (if it exists). Otherwise, + * the target tid must be an old one. */ - if (!tid_geq(journal->j_commit_request, target)) { + if (journal->j_running_transaction && + journal->j_running_transaction->t_tid == target) { /* * We want a new commit: OK, mark the request and wakeup the * commit thread. We do _not_ do the commit ourselves. @@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target) journal->j_commit_sequence); wake_up(&journal->j_wait_commit); return 1; - } + } else if (!tid_geq(journal->j_commit_request, target)) + /* This should never happen, but if it does, preserve + the evidence before kjournald goes into a loop and + increments j_commit_sequence beyond all recognition. */ + WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n", + journal->j_commit_request, journal->j_commit_sequence, + target, journal->j_running_transaction ? + journal->j_running_transaction->t_tid : 0); return 0; } ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-05-05 15:53 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <BCB84D936723884B91E4CC5CA0A7C54AA4F6D082BE@EMEADALEXMB1.corp.nai.org> 2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o 2011-04-26 0:23 ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o 2011-04-26 0:23 ` [PATCH 2/2] jbd: " Theodore Ts'o 2011-04-30 17:17 ` Ted Ts'o 2011-05-02 15:07 ` Jan Kara 2011-05-02 18:29 ` Ted Ts'o 2011-05-02 19:04 ` Jan Kara 2011-05-02 21:31 ` Ted Ts'o 2011-05-04 14:21 ` Martin_Zielinski 2011-05-04 21:55 ` Jan Kara 2011-05-05 14:11 ` Martin_Zielinski 2011-05-05 15:53 ` Jan Kara 2011-05-05 14:55 ` Martin_Zielinski 2011-05-05 15:43 ` Jan Kara 2011-04-26 9:07 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski 2011-04-26 12:23 ` Ted Ts'o 2011-04-26 12:45 ` Martin_Zielinski 2011-04-26 17:20 ` Ted Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).