All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Martin_Zielinski@mcafee.com
Cc: linux-ext4@vger.kernel.org
Subject: Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c
Date: Tue, 26 Apr 2011 13:20:57 -0400	[thread overview]
Message-ID: <20110426172057.GH9486@thunk.org> (raw)
In-Reply-To: <BCB84D936723884B91E4CC5CA0A7C54AA4F6D085B8@EMEADALEXMB1.corp.nai.org>

On Tue, Apr 26, 2011 at 07:45:33AM -0500, Martin_Zielinski@mcafee.com wrote:

> I will port the jbd2 debugging code to jbd an will try to get the
> new kernel into production.  After a reboot we will have to wait
> several weeks. (Strange: all machines failed within 72 hours).

Great, thanks.

> With sqlite I can currently produce ~10.000.000 commits in one hour
> with a program that does nothing else.  I doubt that it is possible
> to have an overflow in such a short time that we are observing.
> Maybe the __log_start_commit commit call comes with a corrupt target
> id from elsewhere. But your patch will catch that, too.

Agreed; that's why I don't really believe the wraparound theory.  For
your convenience, this is the revised (cleaned up) patch for the
ext3/jbd (it just cleans up how we print the warning).

						- Ted

commit 4ea00445c7f5d3dfa6219262598a2a8319df07c7
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Tue Apr 26 13:14:55 2011 -0400

    jbd: fix fsync() tid wraparound bug
    
    If an application program does not make any changes to the indirect
    blocks or extent tree, i_datasync_tid will not get updated.  If there
    are enough commits (i.e., 2**31) such that tid_geq()'s calculations
    wrap, and there isn't a currently active transaction at the time of
    the fdatasync() call, this can end up triggering a BUG_ON in
    fs/jbd/commit.c:
    
    	J_ASSERT(journal->j_running_transaction != NULL);
    
    It's pretty rare that this can happen, since it requires the use of
    fdatasync() plus *very* frequent and excessive use of fsync().  But
    with the right workload, it can.
    
    We fix this by replacing the use of tid_geq() with an equality test,
    since there's only one valid transaction id that we is valid for us to
    wait until it is commited: namely, the currently running transaction
    (if it exists).
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index b3713af..1b71ce6 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal)
 int __log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
-	if (!tid_geq(journal->j_commit_request, target)) {
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == target) {
 		/*
 		 * We want a new commit: OK, mark the request and wakeup the
 		 * commit thread.  We do _not_ do the commit ourselves.
@@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target)
 			  journal->j_commit_sequence);
 		wake_up(&journal->j_wait_commit);
 		return 1;
-	}
+	} else if (!tid_geq(journal->j_commit_request, target))
+		/* This should never happen, but if it does, preserve
+		   the evidence before kjournald goes into a loop and
+		   increments j_commit_sequence beyond all recognition. */
+		WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n",
+		     journal->j_commit_request, journal->j_commit_sequence,
+		     target, journal->j_running_transaction ? 
+		     journal->j_running_transaction->t_tid : 0);
 	return 0;
 }
 

      reply	other threads:[~2011-04-26 17:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-21 14:17 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski
2011-04-25 23:14 ` Ted Ts'o
2011-04-26  0:23   ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o
2011-04-26  0:23   ` [PATCH 2/2] jbd: " Theodore Ts'o
2011-04-30 17:17     ` Ted Ts'o
2011-05-02 15:07       ` Jan Kara
2011-05-02 18:29         ` Ted Ts'o
2011-05-02 19:04           ` Jan Kara
2011-05-02 21:31             ` Ted Ts'o
2011-05-04 14:21               ` Martin_Zielinski
2011-05-04 21:55                 ` Jan Kara
2011-05-05 14:11                   ` Martin_Zielinski
2011-05-05 15:53                     ` Jan Kara
2011-05-05 14:55                   ` Martin_Zielinski
2011-05-05 15:43                     ` Jan Kara
2011-04-26  9:07   ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski
2011-04-26 12:23     ` Ted Ts'o
2011-04-26 12:45       ` Martin_Zielinski
2011-04-26 17:20         ` Ted Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110426172057.GH9486@thunk.org \
    --to=tytso@mit.edu \
    --cc=Martin_Zielinski@mcafee.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.