Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c
       [not found] <BCB84D936723884B91E4CC5CA0A7C54AA4F6D082BE@EMEADALEXMB1.corp.nai.org>
@ 2011-04-25 23:14 ` Ted Ts'o
  2011-04-26  0:23   ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ted Ts'o @ 2011-04-25 23:14 UTC (permalink / raw)
  To: Martin_Zielinski; +Cc: linux-ext4

On Thu, Apr 21, 2011 at 09:17:57AM -0500, Martin_Zielinski@McAfee.com wrote:
> 
> I posted this BUG already on the ext3-users list without response.
> After making some new observations I hope, that someone here can
> tell me these make sense. Kernel output of the BUG is at the end of
> the mail.

Hi Martin,

Thanks for your observations.  I don't necessarily always follow mail
sent to ext3-users, but fortunately I saw this note sent to the LKML
list.  

> Here's some debug output that I put into the code:
> kernel: (fs/ext3/fsync.c, 77): ext3_sync_file: ext3_sync_file datasync=1 d_tid=27807 tid=27846
> kernel: (fs/jbd/journal.c, 467): log_start_commit: log start commit called with commit request=27845, tid=27807 running transaction=ffff8800266913c0 27846
> 
> So the "really-commited" transaction id was advancing while this
> datasync_tid stayed the same and journal.c - log_start_commit() was
> called without waking the commit process.
> 
> I wondered what happens if the current journal tid is overflowing
> (32bit unsigned integer). By forcing the tid in get_transaction to
> jump close to UINT_MAX, I could reproduce the BUG.

A simple overflow shouldn't cause the problem, because of how
tid_geq() is coded.  However, if there have been 2**31 commits since
the fdatasync file has been opened, it's possible to trigger this.
That's a **lot** of commits, so I'm not sure I'm completely happy with
this theory.  Nevertheless, I believe this set of patches (one for
ext4, and one for ext3), should prevent the crash from happening.

      	      	  	 		    	  - Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] jbd2: fix fsync() tid wraparound bug
  2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o
@ 2011-04-26  0:23   ` Theodore Ts'o
  2011-04-26  0:23   ` [PATCH 2/2] jbd: " Theodore Ts'o
  2011-04-26  9:07   ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski
  2 siblings, 0 replies; 18+ messages in thread
From: Theodore Ts'o @ 2011-04-26  0:23 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Martin_Zielinski, Theodore Ts'o

If an application program does not make any changes to the indirect
blocks or extent tree, i_datasync_tid will not get updated.  If there
are enough commits (i.e., 2**31) such that tid_geq()'s calculations
wrap, and there isn't a currently active transaction at the time of
the fdatasync() call, this can end up triggering a BUG_ON in
fs/jbd2/commit.c:

	J_ASSERT(journal->j_running_transaction != NULL);

It's pretty rare that this can happen, since it requires the use of
fdatasync() plus *very* frequent and excessive use of fsync().  But
with the right workload, it can.

We fix this by replacing the use of tid_geq() with an equality test
---
 fs/jbd2/journal.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e0ec3db..a5daa5a 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -479,9 +479,12 @@ int __jbd2_log_space_left(journal_t *journal)
 int __jbd2_log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
-	if (!tid_geq(journal->j_commit_request, target)) {
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == target) {
 		/*
 		 * We want a new commit: OK, mark the request and wakeup the
 		 * commit thread.  We do _not_ do the commit ourselves.
@@ -493,6 +496,17 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t target)
 			  journal->j_commit_sequence);
 		wake_up(&journal->j_wait_commit);
 		return 1;
+	} else if (!tid_geq(journal->j_commit_request, target)) {
+		/* This should never happen, but if it does, preserve
+		   the evidence before kjournald goes into a loop and
+		   increments j_commit_sequence beyond all recognition. */
+		pr_err("jbd2: bad log_start_commit: %u %u %u\n",
+		       journal->j_commit_request, journal->j_commit_sequence,
+		       target);
+		if (journal->j_running_transaction)
+			pr_err("jbd2: current txn: %u\n",
+			       journal->j_running_transaction->t_tid);
+		WARN_ON(1);
 	}
 	return 0;
 }
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o
  2011-04-26  0:23   ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o
@ 2011-04-26  0:23   ` Theodore Ts'o
  2011-04-30 17:17     ` Ted Ts'o
  2011-04-26  9:07   ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski
  2 siblings, 1 reply; 18+ messages in thread
From: Theodore Ts'o @ 2011-04-26  0:23 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Martin_Zielinski, Theodore Ts'o

If an application program does not make any changes to the indirect
blocks or extent tree, i_datasync_tid will not get updated.  If there
are enough commits (i.e., 2**31) such that tid_geq()'s calculations
wrap, and there isn't a currently active transaction at the time of
the fdatasync() call, this can end up triggering a BUG_ON in
fs/jbd/commit.c:

	J_ASSERT(journal->j_running_transaction != NULL);

It's pretty rare that this can happen, since it requires the use of
fdatasync() plus *very* frequent and excessive use of fsync().  But
with the right workload, it can.

We fix this by replacing the use of tid_geq() with an equality test
---
 fs/jbd/journal.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index b3713af..6b51233 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal)
 int __log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
-	if (!tid_geq(journal->j_commit_request, target)) {
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == target) {
 		/*
 		 * We want a new commit: OK, mark the request and wakeup the
 		 * commit thread.  We do _not_ do the commit ourselves.
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-04-26  0:23   ` [PATCH 2/2] jbd: " Theodore Ts'o
@ 2011-04-30 17:17     ` Ted Ts'o
  2011-05-02 15:07       ` Jan Kara
  0 siblings, 1 reply; 18+ messages in thread
From: Ted Ts'o @ 2011-04-30 17:17 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, Martin_Zielinski

Hi Jan,

I don't know if you've been following this thread, but I was wondering
if you could review this patch, (a) for inclusion in the ext3 tree,
and (b) because I'd appreciate a second pair of eyes looking at this
patch, since I intend to push similar change to jbd2.

I'm not entirely convinced this is caused by tid's wrapping around,
since that would be a huge number of commits, but if it's not that,
somehow i_datasync_tid or i_sync_tid is either getting corrupted or
not getting set --- and I have no idea how that could be happening.
This patch should at least avoid the system from crashing when we hit
the case, and harmlessly handle the situation --- with at the worst
case, an journal commit that wouldn't otherwise be needed.

As background, I've been on this bug for months now, as it's been
reported to me as occasionally happening on Android devices that have
been using ext4.  Since I hadn't seen any reports of this in the field
in the x86 world, and this code hadn't changed in a long, long time, I
had originally assumed it was an ARM-specific bug.  However, recently,
Martin Zielinski (on this thread) has reported this problem on an x86
system --- and on a x86 system to boot.

Martin suspects it may have to do with sqllite --- which is consistent
with I've seen, since I believe Android devices use sqllite quite
heavily as well.

				- Ted

jbd: fix fsync() tid wraparound bug

If an application program does not make any changes to the indirect
blocks or extent tree, i_datasync_tid will not get updated.  If there
are enough commits (i.e., 2**31) such that tid_geq()'s calculations
wrap, and there isn't a currently active transaction at the time of
the fdatasync() call, this can end up triggering a BUG_ON in
fs/jbd/commit.c:

	J_ASSERT(journal->j_running_transaction != NULL);

It's pretty rare that this can happen, since it requires the use of
fdatasync() plus *very* frequent and excessive use of fsync().  But
with the right workload, it can.

We fix this by replacing the use of tid_geq() with an equality test,
since there's only one valid transaction id that we is valid for us to
wait until it is commited: namely, the currently running transaction
(if it exists).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd/journal.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index b3713af..1b71ce6 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal)
 int __log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
-	if (!tid_geq(journal->j_commit_request, target)) {
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == target) {
 		/*
 		 * We want a new commit: OK, mark the request and wakeup the
 		 * commit thread.  We do _not_ do the commit ourselves.
@@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target)
 			  journal->j_commit_sequence);
 		wake_up(&journal->j_wait_commit);
 		return 1;
-	}
+	} else if (!tid_geq(journal->j_commit_request, target))
+		/* This should never happen, but if it does, preserve
+		   the evidence before kjournald goes into a loop and
+		   increments j_commit_sequence beyond all recognition. */
+		WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n",
+		     journal->j_commit_request, journal->j_commit_sequence,
+		     target, journal->j_running_transaction ? 
+		     journal->j_running_transaction->t_tid : 0);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-04-30 17:17     ` Ted Ts'o
@ 2011-05-02 15:07       ` Jan Kara
  2011-05-02 18:29         ` Ted Ts'o
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2011-05-02 15:07 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Jan Kara, Ext4 Developers List, Martin_Zielinski

  Hi Ted,

On Sat 30-04-11 13:17:11, Ted Tso wrote:
> I don't know if you've been following this thread, but I was wondering
> if you could review this patch, (a) for inclusion in the ext3 tree,
> and (b) because I'd appreciate a second pair of eyes looking at this
> patch, since I intend to push similar change to jbd2.
  Thanks for forwarding. For some reason I got unsubscribed from linux-ext4
a while ago and didn't notice this since linux-fsdevel goes into the same
mailbox.

> I'm not entirely convinced this is caused by tid's wrapping around,
> since that would be a huge number of commits, but if it's not that,
> somehow i_datasync_tid or i_sync_tid is either getting corrupted or
> not getting set --- and I have no idea how that could be happening.
> This patch should at least avoid the system from crashing when we hit
> the case, and harmlessly handle the situation --- with at the worst
> case, an journal commit that wouldn't otherwise be needed.
  The patch looks OK in any case. I'll take it in my tree. It would take
about 24 days of constant 1000 trans/s load to trigger this. That's a quite
heavy load but not so unrealistic with today's HW.

> As background, I've been on this bug for months now, as it's been
> reported to me as occasionally happening on Android devices that have
> been using ext4.  Since I hadn't seen any reports of this in the field
> in the x86 world, and this code hadn't changed in a long, long time, I
> had originally assumed it was an ARM-specific bug.  However, recently,
> Martin Zielinski (on this thread) has reported this problem on an x86
> system --- and on a x86 system to boot.
> 
> Martin suspects it may have to do with sqllite --- which is consistent
> with I've seen, since I believe Android devices use sqllite quite
> heavily as well.
  Yeah, it may be.

								Honza
> jbd: fix fsync() tid wraparound bug
> 
> If an application program does not make any changes to the indirect
> blocks or extent tree, i_datasync_tid will not get updated.  If there
> are enough commits (i.e., 2**31) such that tid_geq()'s calculations
> wrap, and there isn't a currently active transaction at the time of
> the fdatasync() call, this can end up triggering a BUG_ON in
> fs/jbd/commit.c:
> 
> 	J_ASSERT(journal->j_running_transaction != NULL);
> 
> It's pretty rare that this can happen, since it requires the use of
> fdatasync() plus *very* frequent and excessive use of fsync().  But
> with the right workload, it can.
> 
> We fix this by replacing the use of tid_geq() with an equality test,
> since there's only one valid transaction id that we is valid for us to
> wait until it is commited: namely, the currently running transaction
> (if it exists).
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
>  fs/jbd/journal.c |   16 +++++++++++++---
>  1 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
> index b3713af..1b71ce6 100644
> --- a/fs/jbd/journal.c
> +++ b/fs/jbd/journal.c
> @@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal)
>  int __log_start_commit(journal_t *journal, tid_t target)
>  {
>  	/*
> -	 * Are we already doing a recent enough commit?
> +	 * The only transaction we can possibly wait upon is the
> +	 * currently running transaction (if it exists).  Otherwise,
> +	 * the target tid must be an old one.
>  	 */
> -	if (!tid_geq(journal->j_commit_request, target)) {
> +	if (journal->j_running_transaction &&
> +	    journal->j_running_transaction->t_tid == target) {
>  		/*
>  		 * We want a new commit: OK, mark the request and wakeup the
>  		 * commit thread.  We do _not_ do the commit ourselves.
> @@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target)
>  			  journal->j_commit_sequence);
>  		wake_up(&journal->j_wait_commit);
>  		return 1;
> -	}
> +	} else if (!tid_geq(journal->j_commit_request, target))
> +		/* This should never happen, but if it does, preserve
> +		   the evidence before kjournald goes into a loop and
> +		   increments j_commit_sequence beyond all recognition. */
> +		WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n",
> +		     journal->j_commit_request, journal->j_commit_sequence,
> +		     target, journal->j_running_transaction ? 
> +		     journal->j_running_transaction->t_tid : 0);
>  	return 0;
>  }
>  
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-02 15:07       ` Jan Kara
@ 2011-05-02 18:29         ` Ted Ts'o
  2011-05-02 19:04           ` Jan Kara
  0 siblings, 1 reply; 18+ messages in thread
From: Ted Ts'o @ 2011-05-02 18:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, Martin_Zielinski

On Mon, May 02, 2011 at 05:07:58PM +0200, Jan Kara wrote:
>   The patch looks OK in any case. I'll take it in my tree.
Great, thanks.

> It would take
> about 24 days of constant 1000 trans/s load to trigger this. That's a quite
> heavy load but not so unrealistic with today's HW.

True; but one of the reason why I'm not sure I believe that is this
bug is showing up on some Android devices, where (a) 1000 trans/second
*is* a not just a huge load, but almost impossible to believe, and (b)
if we really are doing 2**31 commits, we would be wearing out the
flash storage being used on the Android devices!  (And we're seeing a
handful of failures every week from the testers who participate in a
kerneloops-like reporting system.)

So while I would very much like to believe that it's caused by a tid
wrap, I'm worried there is another bug hiding here....

    		      	 	     	    - Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-02 18:29         ` Ted Ts'o
@ 2011-05-02 19:04           ` Jan Kara
  2011-05-02 21:31             ` Ted Ts'o
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2011-05-02 19:04 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Jan Kara, Ext4 Developers List, Martin_Zielinski

On Mon 02-05-11 14:29:45, Ted Tso wrote:
> On Mon, May 02, 2011 at 05:07:58PM +0200, Jan Kara wrote:
> >   The patch looks OK in any case. I'll take it in my tree.
> Great, thanks.
> 
> > It would take
> > about 24 days of constant 1000 trans/s load to trigger this. That's a quite
> > heavy load but not so unrealistic with today's HW.
> 
> True; but one of the reason why I'm not sure I believe that is this
> bug is showing up on some Android devices, where (a) 1000 trans/second
> *is* a not just a huge load, but almost impossible to believe, and (b)
> if we really are doing 2**31 commits, we would be wearing out the
> flash storage being used on the Android devices!  (And we're seeing a
> handful of failures every week from the testers who participate in a
> kerneloops-like reporting system.)
  Ah, OK. I didn't know it's so many reports. Indeed it looks like there's
another bug. But then the WARN you have added should tell us more about
who's causing the problem, right?

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-02 19:04           ` Jan Kara
@ 2011-05-02 21:31             ` Ted Ts'o
  2011-05-04 14:21               ` Martin_Zielinski
  0 siblings, 1 reply; 18+ messages in thread
From: Ted Ts'o @ 2011-05-02 21:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, Martin_Zielinski

On Mon, May 02, 2011 at 09:04:41PM +0200, Jan Kara wrote:
>   Ah, OK. I didn't know it's so many reports. Indeed it looks like there's
> another bug. But then the WARN you have added should tell us more about
> who's causing the problem, right?

That's the hope.  I'm not sure WARNs get logged in Android's reporting
system.  I need to work the Android folks to figure this out.  Worst
case I may have to turn the WARN into a BUG for them (although it may
be tough to convince them to make such a change).  I'm hoping though
that the debugging information from Martin's machines will also come
through with some useful information that will allow us to root cause
whatever is causing this to happen.

							- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-02 21:31             ` Ted Ts'o
@ 2011-05-04 14:21               ` Martin_Zielinski
  2011-05-04 21:55                 ` Jan Kara
  0 siblings, 1 reply; 18+ messages in thread
From: Martin_Zielinski @ 2011-05-04 14:21 UTC (permalink / raw)
  To: tytso, jack; +Cc: linux-ext4

Here's an update.
In my first post I was not aware of the implementation of tid_gt.
I agree that 2 and a half billion commits on an SD card are - hmph - unlikely

The kernels with the patch is not yet installed because we recently provided a kernel with a panic() instead of J_ASSERT and a crashkernel. 
We now have 5 cores all showing the same behavior in j_commit_sequence vs. j_commit_request.

What the cores show (snipped):
(gdb) bt
#0  crash_setup_regs (regs=0x0) at /usr/src/debug/linux-2.6.32/arch/x86/include/asm/kexec.h:127
#1  crash_kexec (regs=0x0) at kernel/kexec.c:1076
#2  0xffffffff81048872 in panic (fmt=0xffffffff818bdb95 "running transaction is NULL")
    at kernel/panic.c:82
#3  0xffffffff81159b79 in journal_commit_transaction (journal=0xffff88031e6c4600)
    at fs/jbd/commit.c:347
#4  0xffffffff8115de49 in kjournald (arg=<value optimized out>) at fs/jbd/journal.c:150
#5  0xffffffff8105fccf in kthread (_create=<value optimized out>) at kernel/kthread.c:78
#6  0xffffffff8100c9da in ?? ()
#7  0x0000000000000000 in ?? ()

gdb) p *journal
$4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, 
  j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = {
      slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = {
      raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, 
      prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, 
  j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40,
...
j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, 
  j_commit_sequence = 2288014067, j_commit_request = 140530417,
...
  j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, 
  j_average_commit_time = 69247, j_private = 0xffff88031fd49400}

Means that j_commit_request would be considered to be greater than j_commit_sequence and trigger a commit.

The j_last_sync_writer has a promising backtrace:

crash> bt 4568
PID: 4568   TASK: ffff8802c38ce300  CPU: 8   COMMAND: "core"
 #0 [ffff8802c38d1da8] schedule at ffffffff8159f5e8
 #1 [ffff8802c38d1e50] log_wait_commit at ffffffff8115d4a1
 #2 [ffff8802c38d1ec0] ext3_sync_file at ffffffff8113c589
 #3 [ffff8802c38d1ef0] vfs_fsync_range at ffffffff81105588
 #4 [ffff8802c38d1f30] vfs_fsync at ffffffff81105612
 #5 [ffff8802c38d1f40] do_fsync at ffffffff81105646
 #6 [ffff8802c38d1f70] sys_fdatasync at ffffffff8110566a
 #7 [ffff8802c38d1f80] system_call_fastpath at ffffffff8100ba2b
    RIP: 00007f5be9bd6587  RSP: 0000000047386f38  RFLAGS: 00010202
    RAX: 000000000000004b  RBX: ffffffff8100ba2b  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00000000000000f7
    RBP: 00007f5aecf23cc8   R8: 00007f5bc54e56a0   R9: 00000000000011d8
    R10: 0000000001c01800  R11: 0000000000000202  R12: ffffffff8110566a
    R13: ffff8802c38d1f78  R14: 00007f5aecf23b98  R15: 0000000000000000
    ORIG_RAX: 000000000000004b  CS: 0033  SS: 002b

To make a long story short: The inode did not carry the j_commit_request tid.

But:
crash> foreach files
...
247 ffff88031e65bb00 ffff88026c3ea540 ffff88031f0c0758 REG  /opt/data/users/user.db
...
(this is an sqlite database)

And:
(gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid
$5 = {counter = -2006954411}
(gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid
$3 = {counter = 140530417}

> j_commit_request = 140530417

So it *is* a datasync from sqlite. And your fix will catch it. 
I still don't understand, where this number comes from. 

Unfortunately I cannot provide the core files as they are coming from customers. 
But if there is anything I can do to provide further information, please let me know.

Cheers,
Martin

-----Original Message-----
From: Ted Ts'o [mailto:tytso@mit.edu] 
Sent: Montag, 2. Mai 2011 23:31
To: Jan Kara
Cc: Ext4 Developers List; Zielinski, Martin
Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug

On Mon, May 02, 2011 at 09:04:41PM +0200, Jan Kara wrote:
>   Ah, OK. I didn't know it's so many reports. Indeed it looks like there's
> another bug. But then the WARN you have added should tell us more about
> who's causing the problem, right?

That's the hope.  I'm not sure WARNs get logged in Android's reporting
system.  I need to work the Android folks to figure this out.  Worst
case I may have to turn the WARN into a BUG for them (although it may
be tough to convince them to make such a change).  I'm hoping though
that the debugging information from Martin's machines will also come
through with some useful information that will allow us to root cause
whatever is causing this to happen.

							- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-04 14:21               ` Martin_Zielinski
@ 2011-05-04 21:55                 ` Jan Kara
  2011-05-05 14:11                   ` Martin_Zielinski
  2011-05-05 14:55                   ` Martin_Zielinski
  0 siblings, 2 replies; 18+ messages in thread
From: Jan Kara @ 2011-05-04 21:55 UTC (permalink / raw)
  To: Martin_Zielinski; +Cc: tytso, jack, linux-ext4

On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote:
> Here's an update.
> In my first post I was not aware of the implementation of tid_gt.
> I agree that 2 and a half billion commits on an SD card are - hmph -
> unlikely
  <snip>

> gdb) p *journal
> $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, 
>   j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = {
>       slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = {
>       raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, 
>       prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, 
>   j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40,
> ...
> j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, 
>   j_commit_sequence = 2288014067, j_commit_request = 140530417,
> ...
>   j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, 
>   j_average_commit_time = 69247, j_private = 0xffff88031fd49400}
  <snip>

> (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid
> $5 = {counter = -2006954411}
> (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid
> $3 = {counter = 140530417}
> 
> > j_commit_request = 140530417
> 
> So it *is* a datasync from sqlite. And your fix will catch it. 
> I still don't understand, where this number comes from. 
  Ok, so i_datasync_tid got corrupted. But look at the numbers in hex:
i_datasync_tid==140530417==0x86052F1
and
i_commit_sequence==2288014067==0x886052F3

So it's a single bit error - we lost the highest bit of the number. Are you
getting the cores from different machines? Otherwise I'd suspect the HW.
If it's not HW I'm at loss what can cause it... You can try moving
i_datasync_tid to a different place in struct ext3_inode_info so that we
can rule out / confirm whether some code external to i_datasync_tid
handling is just causing random memory corruption...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-04 21:55                 ` Jan Kara
@ 2011-05-05 14:11                   ` Martin_Zielinski
  2011-05-05 15:53                     ` Jan Kara
  2011-05-05 14:55                   ` Martin_Zielinski
  1 sibling, 1 reply; 18+ messages in thread
From: Martin_Zielinski @ 2011-05-05 14:11 UTC (permalink / raw)
  To: jack; +Cc: tytso, linux-ext4

It is not a hardware bug and very unlikely a race condition or random memory corruption.
We have 7 machines that failed with an uptime of 12-13 days.
12 days earlier about 5 different machines failed after the same uptime. All machines were rebooted
after this issue.

In 5 out of 7 cores the commit request number come from the same sqlite database.
In 2 cores I could not find an inode or file structure pointing to this database and hence
it could not be verified that the request number comes from this database inode.

I'm not so sure about the bit error. Some have the hi-bit set, some not.
Due to the implementation of tid_gt() I would  expect that the numbers differ in the high bit.
The condition produces a wrong result, if the difference between the numbers is greater than INT_MAX.

The sequence / request numbers / difference INT_MAX + x:
886052f3 / 086052f1 / 3
61d305fe / e1ce83f4 / 295434
887d10c8 / 087acf05 / 147908
8e3d0b25 / 0e374365 / 378817
702d4061 / f02a5e0b / 189014
73d6775a / f3d67756 / 4
824846ad / 024846ab / 2

On some of the machines we traced the commit_sequence for a short time.
However no indication could be found the logs, that a datasync was triggered with always the same tid.

Result example:

Uptime 5 days, 33 min : commit_sequence: 1135688364
Uptime 12 days,  2:40 : commit_sequence: 1960701710

825013346 commits in 170 hours

4853019 commits per hour => 442 hours or 18 days to have INT_MAX commits.

Cheers,
Martin

-----Original Message-----
From: Jan Kara [mailto:jack@suse.cz]
Sent: Mittwoch, 4. Mai 2011 23:55
To: Zielinski, Martin
Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug

On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote:
> Here's an update.
> In my first post I was not aware of the implementation of tid_gt.
> I agree that 2 and a half billion commits on an SD card are - hmph -
> unlikely
  <snip>

> gdb) p *journal
> $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8,
>   j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = {
>       slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = {
>       raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638,
>       prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0,
>   j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40,
> ...
> j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068,
>   j_commit_sequence = 2288014067, j_commit_request = 140530417,
> ...
>   j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568,
>   j_average_commit_time = 69247, j_private = 0xffff88031fd49400}
  <snip>

> (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid
> $5 = {counter = -2006954411}
> (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid
> $3 = {counter = 140530417}
>
> > j_commit_request = 140530417
>
> So it *is* a datasync from sqlite. And your fix will catch it.
> I still don't understand, where this number comes from.
  Ok, so i_datasync_tid got corrupted. But look at the numbers in hex:
i_datasync_tid==140530417==0x86052F1
and
i_commit_sequence==2288014067==0x886052F3

So it's a single bit error - we lost the highest bit of the number. Are you
getting the cores from different machines? Otherwise I'd suspect the HW.
If it's not HW I'm at loss what can cause it... You can try moving
i_datasync_tid to a different place in struct ext3_inode_info so that we
can rule out / confirm whether some code external to i_datasync_tid
handling is just causing random memory corruption...

                                                                Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-05 14:11                   ` Martin_Zielinski
@ 2011-05-05 15:53                     ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2011-05-05 15:53 UTC (permalink / raw)
  To: Martin_Zielinski; +Cc: jack, tytso, linux-ext4

On Thu 05-05-11 09:11:22, Martin_Zielinski@McAfee.com wrote:
> It is not a hardware bug and very unlikely a race condition or random
> memory corruption.  We have 7 machines that failed with an uptime of
> 12-13 days.  12 days earlier about 5 different machines failed after the
> same uptime. All machines were rebooted after this issue.
> 
> In 5 out of 7 cores the commit request number come from the same sqlite
> database.  In 2 cores I could not find an inode or file structure
> pointing to this database and hence it could not be verified that the
> request number comes from this database inode.
> 
> I'm not so sure about the bit error. Some have the hi-bit set, some not.
> Due to the implementation of tid_gt() I would  expect that the numbers
> differ in the high bit.  The condition produces a wrong result, if the
> difference between the numbers is greater than INT_MAX.
   Yes, sorry. I was confused yesterday. The numbers are exactly such as
they should be when an application constantly calls fdatasync() without
modifying file metadata. 
 
> The sequence / request numbers / difference INT_MAX + x:
> 886052f3 / 086052f1 / 3
> 61d305fe / e1ce83f4 / 295434
> 887d10c8 / 087acf05 / 147908
> 8e3d0b25 / 0e374365 / 378817
> 702d4061 / f02a5e0b / 189014
> 73d6775a / f3d67756 / 4
> 824846ad / 024846ab / 2
> 
V On some of the machines we traced the commit_sequence for a short time.
> However no indication could be found the logs, that a datasync was
> triggered with always the same tid.
> 
> Result example:
> 
> Uptime 5 days, 33 min : commit_sequence: 1135688364
> Uptime 12 days,  2:40 : commit_sequence: 1960701710
> 
> 825013346 commits in 170 hours
> 
> 4853019 commits per hour => 442 hours or 18 days to have INT_MAX commits.
  Yes. In your case it seems we really wrap-around tid. I'll push the fix
to Linus in the next merge window and then it will go also to -stable
kernels. Thanks for your testing!

									Honza
> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz]
> Sent: Mittwoch, 4. Mai 2011 23:55
> To: Zielinski, Martin
> Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
> 
> On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote:
> > Here's an update.
> > In my first post I was not aware of the implementation of tid_gt.
> > I agree that 2 and a half billion commits on an SD card are - hmph -
> > unlikely
>   <snip>
> 
> > gdb) p *journal
> > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8,
> >   j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = {
> >       slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = {
> >       raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638,
> >       prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0,
> >   j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40,
> > ...
> > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068,
> >   j_commit_sequence = 2288014067, j_commit_request = 140530417,
> > ...
> >   j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568,
> >   j_average_commit_time = 69247, j_private = 0xffff88031fd49400}
>   <snip>
> 
> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid
> > $5 = {counter = -2006954411}
> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid
> > $3 = {counter = 140530417}
> >
> > > j_commit_request = 140530417
> >
> > So it *is* a datasync from sqlite. And your fix will catch it.
> > I still don't understand, where this number comes from.
>   Ok, so i_datasync_tid got corrupted. But look at the numbers in hex:
> i_datasync_tid==140530417==0x86052F1
> and
> i_commit_sequence==2288014067==0x886052F3
> 
> So it's a single bit error - we lost the highest bit of the number. Are you
> getting the cores from different machines? Otherwise I'd suspect the HW.
> If it's not HW I'm at loss what can cause it... You can try moving
> i_datasync_tid to a different place in struct ext3_inode_info so that we
> can rule out / confirm whether some code external to i_datasync_tid
> handling is just causing random memory corruption...
> 
>                                                                 Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-04 21:55                 ` Jan Kara
  2011-05-05 14:11                   ` Martin_Zielinski
@ 2011-05-05 14:55                   ` Martin_Zielinski
  2011-05-05 15:43                     ` Jan Kara
  1 sibling, 1 reply; 18+ messages in thread
From: Martin_Zielinski @ 2011-05-05 14:55 UTC (permalink / raw)
  To: jack; +Cc: tytso, linux-ext4

Hello once more.
I have one concern against the patch:
If the situation is triggered again and again, the patch would produce lots of output.
Maybe it's better to use WARN_ONCE.

Cheers,
Martin

-----Original Message-----
From: Jan Kara [mailto:jack@suse.cz] 
Sent: Mittwoch, 4. Mai 2011 23:55
To: Zielinski, Martin
Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug

On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote:
> Here's an update.
> In my first post I was not aware of the implementation of tid_gt.
> I agree that 2 and a half billion commits on an SD card are - hmph -
> unlikely
  <snip>

> gdb) p *journal
> $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, 
>   j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = {
>       slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = {
>       raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, 
>       prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, 
>   j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40,
> ...
> j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, 
>   j_commit_sequence = 2288014067, j_commit_request = 140530417,
> ...
>   j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, 
>   j_average_commit_time = 69247, j_private = 0xffff88031fd49400}
  <snip>

> (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid
> $5 = {counter = -2006954411}
> (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid
> $3 = {counter = 140530417}
> 
> > j_commit_request = 140530417
> 
> So it *is* a datasync from sqlite. And your fix will catch it. 
> I still don't understand, where this number comes from. 
  Ok, so i_datasync_tid got corrupted. But look at the numbers in hex:
i_datasync_tid==140530417==0x86052F1
and
i_commit_sequence==2288014067==0x886052F3

So it's a single bit error - we lost the highest bit of the number. Are you
getting the cores from different machines? Otherwise I'd suspect the HW.
If it's not HW I'm at loss what can cause it... You can try moving
i_datasync_tid to a different place in struct ext3_inode_info so that we
can rule out / confirm whether some code external to i_datasync_tid
handling is just causing random memory corruption...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
  2011-05-05 14:55                   ` Martin_Zielinski
@ 2011-05-05 15:43                     ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2011-05-05 15:43 UTC (permalink / raw)
  To: Martin_Zielinski; +Cc: jack, tytso, linux-ext4

On Thu 05-05-11 09:55:22, Martin_Zielinski@McAfee.com wrote:
> Hello once more.
> I have one concern against the patch:
> If the situation is triggered again and again, the patch would produce lots of output.
> Maybe it's better to use WARN_ONCE.
  Yes, probably it will. Changed to WARN_ONCE in the JBD patch.

								Honza
> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz] 
> Sent: Mittwoch, 4. Mai 2011 23:55
> To: Zielinski, Martin
> Cc: tytso@mit.edu; jack@suse.cz; linux-ext4@vger.kernel.org
> Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
> 
> On Wed 04-05-11 09:21:04, Martin_Zielinski@McAfee.com wrote:
> > Here's an update.
> > In my first post I was not aware of the implementation of tid_gt.
> > I agree that 2 and a half billion commits on an SD card are - hmph -
> > unlikely
>   <snip>
> 
> > gdb) p *journal
> > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, 
> >   j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = {
> >       slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = {
> >       raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, 
> >       prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, 
> >   j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40,
> > ...
> > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, 
> >   j_commit_sequence = 2288014067, j_commit_request = 140530417,
> > ...
> >   j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, 
> >   j_average_commit_time = 69247, j_private = 0xffff88031fd49400}
>   <snip>
> 
> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid
> > $5 = {counter = -2006954411}
> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid
> > $3 = {counter = 140530417}
> > 
> > > j_commit_request = 140530417
> > 
> > So it *is* a datasync from sqlite. And your fix will catch it. 
> > I still don't understand, where this number comes from. 
>   Ok, so i_datasync_tid got corrupted. But look at the numbers in hex:
> i_datasync_tid==140530417==0x86052F1
> and
> i_commit_sequence==2288014067==0x886052F3
> 
> So it's a single bit error - we lost the highest bit of the number. Are you
> getting the cores from different machines? Otherwise I'd suspect the HW.
> If it's not HW I'm at loss what can cause it... You can try moving
> i_datasync_tid to a different place in struct ext3_inode_info so that we
> can rule out / confirm whether some code external to i_datasync_tid
> handling is just causing random memory corruption...
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c
  2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o
  2011-04-26  0:23   ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o
  2011-04-26  0:23   ` [PATCH 2/2] jbd: " Theodore Ts'o
@ 2011-04-26  9:07   ` Martin_Zielinski
  2011-04-26 12:23     ` Ted Ts'o
  2 siblings, 1 reply; 18+ messages in thread
From: Martin_Zielinski @ 2011-04-26  9:07 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

Ted!
Thank you a lot!
We observed this bug on ~10 out of 40 machines after an uptime from about 3 weeks. All run under comparable conditions.
I will have a closer look on the debugfs output to verify if the situation can  happen within this short time range. Additionally we installed a crash kernel and I changed the BUG into a panic(). 
So I will be able to look at the journal structure if this happens again.

My testing system, which runs under small load, would need several years to reach this point. 
But after two weeks of staring on journaling and filesystem code (that I never saw before), this is the only explanation I could find.
If I can verify that this is the root cause (or not the root cause), I will post this with information about the user-land part that is responsible.

Cheers,
Martin

-----Original Message-----
From: Ted Ts'o [mailto:tytso@mit.edu] 
Sent: Dienstag, 26. April 2011 01:15
To: Zielinski, Martin
Cc: linux-ext4@vger.kernel.org
Subject: Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c

On Thu, Apr 21, 2011 at 09:17:57AM -0500, Martin_Zielinski@McAfee.com wrote:
> 
> I posted this BUG already on the ext3-users list without response.
> After making some new observations I hope, that someone here can
> tell me these make sense. Kernel output of the BUG is at the end of
> the mail.

Hi Martin,

Thanks for your observations.  I don't necessarily always follow mail
sent to ext3-users, but fortunately I saw this note sent to the LKML
list.  

> Here's some debug output that I put into the code:
> kernel: (fs/ext3/fsync.c, 77): ext3_sync_file: ext3_sync_file datasync=1 d_tid=27807 tid=27846
> kernel: (fs/jbd/journal.c, 467): log_start_commit: log start commit called with commit request=27845, tid=27807 running transaction=ffff8800266913c0 27846
> 
> So the "really-commited" transaction id was advancing while this
> datasync_tid stayed the same and journal.c - log_start_commit() was
> called without waking the commit process.
> 
> I wondered what happens if the current journal tid is overflowing
> (32bit unsigned integer). By forcing the tid in get_transaction to
> jump close to UINT_MAX, I could reproduce the BUG.

A simple overflow shouldn't cause the problem, because of how
tid_geq() is coded.  However, if there have been 2**31 commits since
the fdatasync file has been opened, it's possible to trigger this.
That's a **lot** of commits, so I'm not sure I'm completely happy with
this theory.  Nevertheless, I believe this set of patches (one for
ext4, and one for ext3), should prevent the crash from happening.

      	      	  	 		    	  - Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c
  2011-04-26  9:07   ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski
@ 2011-04-26 12:23     ` Ted Ts'o
  2011-04-26 12:45       ` Martin_Zielinski
  0 siblings, 1 reply; 18+ messages in thread
From: Ted Ts'o @ 2011-04-26 12:23 UTC (permalink / raw)
  To: Martin_Zielinski; +Cc: linux-ext4

On Tue, Apr 26, 2011 at 04:07:11AM -0500, Martin_Zielinski@mcafee.com wrote:
> Ted!
> Thank you a lot!
> We observed this bug on ~10 out of 40 machines after an uptime from about 3 weeks. All run under comparable conditions.
> I will have a closer look on the debugfs output to verify if the situation can  happen within this short time range. Additionally we installed a crash kernel and I changed the BUG into a panic(). 
> So I will be able to look at the journal structure if this happens again.

If you would be willing to install the debugging code that is in the
jbd2 patch into the jbd patch, and this put this on your production
machines, that would be really great.  I can send you a revised jbd
patch if that would help (the debugging in code in jbd2 should move
over to the jbd patch really simply).

							- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c
  2011-04-26 12:23     ` Ted Ts'o
@ 2011-04-26 12:45       ` Martin_Zielinski
  2011-04-26 17:20         ` Ted Ts'o
  0 siblings, 1 reply; 18+ messages in thread
From: Martin_Zielinski @ 2011-04-26 12:45 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

I will port the jbd2 debugging code to jbd an will try to get the new kernel into production.
After a reboot we will have to wait several weeks. (Strange: all machines failed within 72 hours).

With sqlite I can currently produce ~10.000.000 commits in one hour with a program that does nothing else.
I doubt that it is possible to have an overflow in such a short time that we are observing.
Maybe the __log_start_commit commit call comes with a corrupt target id from elsewhere. But your patch will catch that, too.

Cheers,
Martin

-----Original Message-----
From: Ted Ts'o [mailto:tytso@mit.edu] 
Sent: Dienstag, 26. April 2011 14:24
To: Zielinski, Martin
Cc: linux-ext4@vger.kernel.org
Subject: Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c

On Tue, Apr 26, 2011 at 04:07:11AM -0500, Martin_Zielinski@mcafee.com wrote:
> Ted!
> Thank you a lot!
> We observed this bug on ~10 out of 40 machines after an uptime from about 3 weeks. All run under comparable conditions.
> I will have a closer look on the debugfs output to verify if the situation can  happen within this short time range. Additionally we installed a crash kernel and I changed the BUG into a panic(). 
> So I will be able to look at the journal structure if this happens again.

If you would be willing to install the debugging code that is in the
jbd2 patch into the jbd patch, and this put this on your production
machines, that would be really great.  I can send you a revised jbd
patch if that would help (the debugging in code in jbd2 should move
over to the jbd patch really simply).

							- Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c
  2011-04-26 12:45       ` Martin_Zielinski
@ 2011-04-26 17:20         ` Ted Ts'o
  0 siblings, 0 replies; 18+ messages in thread
From: Ted Ts'o @ 2011-04-26 17:20 UTC (permalink / raw)
  To: Martin_Zielinski; +Cc: linux-ext4

On Tue, Apr 26, 2011 at 07:45:33AM -0500, Martin_Zielinski@mcafee.com wrote:

> I will port the jbd2 debugging code to jbd an will try to get the
> new kernel into production.  After a reboot we will have to wait
> several weeks. (Strange: all machines failed within 72 hours).

Great, thanks.

> With sqlite I can currently produce ~10.000.000 commits in one hour
> with a program that does nothing else.  I doubt that it is possible
> to have an overflow in such a short time that we are observing.
> Maybe the __log_start_commit commit call comes with a corrupt target
> id from elsewhere. But your patch will catch that, too.

Agreed; that's why I don't really believe the wraparound theory.  For
your convenience, this is the revised (cleaned up) patch for the
ext3/jbd (it just cleans up how we print the warning).

						- Ted

commit 4ea00445c7f5d3dfa6219262598a2a8319df07c7
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Tue Apr 26 13:14:55 2011 -0400

    jbd: fix fsync() tid wraparound bug
    
    If an application program does not make any changes to the indirect
    blocks or extent tree, i_datasync_tid will not get updated.  If there
    are enough commits (i.e., 2**31) such that tid_geq()'s calculations
    wrap, and there isn't a currently active transaction at the time of
    the fdatasync() call, this can end up triggering a BUG_ON in
    fs/jbd/commit.c:
    
    	J_ASSERT(journal->j_running_transaction != NULL);
    
    It's pretty rare that this can happen, since it requires the use of
    fdatasync() plus *very* frequent and excessive use of fsync().  But
    with the right workload, it can.
    
    We fix this by replacing the use of tid_geq() with an equality test,
    since there's only one valid transaction id that we is valid for us to
    wait until it is commited: namely, the currently running transaction
    (if it exists).
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index b3713af..1b71ce6 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal)
 int __log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
-	if (!tid_geq(journal->j_commit_request, target)) {
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == target) {
 		/*
 		 * We want a new commit: OK, mark the request and wakeup the
 		 * commit thread.  We do _not_ do the commit ourselves.
@@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target)
 			  journal->j_commit_sequence);
 		wake_up(&journal->j_wait_commit);
 		return 1;
-	}
+	} else if (!tid_geq(journal->j_commit_request, target))
+		/* This should never happen, but if it does, preserve
+		   the evidence before kjournald goes into a loop and
+		   increments j_commit_sequence beyond all recognition. */
+		WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n",
+		     journal->j_commit_request, journal->j_commit_sequence,
+		     target, journal->j_running_transaction ? 
+		     journal->j_running_transaction->t_tid : 0);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-05-05 15:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <BCB84D936723884B91E4CC5CA0A7C54AA4F6D082BE@EMEADALEXMB1.corp.nai.org>
2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o
2011-04-26  0:23   ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o
2011-04-26  0:23   ` [PATCH 2/2] jbd: " Theodore Ts'o
2011-04-30 17:17     ` Ted Ts'o
2011-05-02 15:07       ` Jan Kara
2011-05-02 18:29         ` Ted Ts'o
2011-05-02 19:04           ` Jan Kara
2011-05-02 21:31             ` Ted Ts'o
2011-05-04 14:21               ` Martin_Zielinski
2011-05-04 21:55                 ` Jan Kara
2011-05-05 14:11                   ` Martin_Zielinski
2011-05-05 15:53                     ` Jan Kara
2011-05-05 14:55                   ` Martin_Zielinski
2011-05-05 15:43                     ` Jan Kara
2011-04-26  9:07   ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski
2011-04-26 12:23     ` Ted Ts'o
2011-04-26 12:45       ` Martin_Zielinski
2011-04-26 17:20         ` Ted Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).