From: Jan Kara <jack@suse.cz>
To: Ted Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
Martin_Zielinski@McAfee.com
Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
Date: Mon, 2 May 2011 17:07:58 +0200 [thread overview]
Message-ID: <20110502150758.GH4556@quack.suse.cz> (raw)
In-Reply-To: <20110430171711.GA2819@thunk.org>
Hi Ted,
On Sat 30-04-11 13:17:11, Ted Tso wrote:
> I don't know if you've been following this thread, but I was wondering
> if you could review this patch, (a) for inclusion in the ext3 tree,
> and (b) because I'd appreciate a second pair of eyes looking at this
> patch, since I intend to push similar change to jbd2.
Thanks for forwarding. For some reason I got unsubscribed from linux-ext4
a while ago and didn't notice this since linux-fsdevel goes into the same
mailbox.
> I'm not entirely convinced this is caused by tid's wrapping around,
> since that would be a huge number of commits, but if it's not that,
> somehow i_datasync_tid or i_sync_tid is either getting corrupted or
> not getting set --- and I have no idea how that could be happening.
> This patch should at least avoid the system from crashing when we hit
> the case, and harmlessly handle the situation --- with at the worst
> case, an journal commit that wouldn't otherwise be needed.
The patch looks OK in any case. I'll take it in my tree. It would take
about 24 days of constant 1000 trans/s load to trigger this. That's a quite
heavy load but not so unrealistic with today's HW.
> As background, I've been on this bug for months now, as it's been
> reported to me as occasionally happening on Android devices that have
> been using ext4. Since I hadn't seen any reports of this in the field
> in the x86 world, and this code hadn't changed in a long, long time, I
> had originally assumed it was an ARM-specific bug. However, recently,
> Martin Zielinski (on this thread) has reported this problem on an x86
> system --- and on a x86 system to boot.
>
> Martin suspects it may have to do with sqllite --- which is consistent
> with I've seen, since I believe Android devices use sqllite quite
> heavily as well.
Yeah, it may be.
Honza
> jbd: fix fsync() tid wraparound bug
>
> If an application program does not make any changes to the indirect
> blocks or extent tree, i_datasync_tid will not get updated. If there
> are enough commits (i.e., 2**31) such that tid_geq()'s calculations
> wrap, and there isn't a currently active transaction at the time of
> the fdatasync() call, this can end up triggering a BUG_ON in
> fs/jbd/commit.c:
>
> J_ASSERT(journal->j_running_transaction != NULL);
>
> It's pretty rare that this can happen, since it requires the use of
> fdatasync() plus *very* frequent and excessive use of fsync(). But
> with the right workload, it can.
>
> We fix this by replacing the use of tid_geq() with an equality test,
> since there's only one valid transaction id that we is valid for us to
> wait until it is commited: namely, the currently running transaction
> (if it exists).
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> fs/jbd/journal.c | 16 +++++++++++++---
> 1 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
> index b3713af..1b71ce6 100644
> --- a/fs/jbd/journal.c
> +++ b/fs/jbd/journal.c
> @@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal)
> int __log_start_commit(journal_t *journal, tid_t target)
> {
> /*
> - * Are we already doing a recent enough commit?
> + * The only transaction we can possibly wait upon is the
> + * currently running transaction (if it exists). Otherwise,
> + * the target tid must be an old one.
> */
> - if (!tid_geq(journal->j_commit_request, target)) {
> + if (journal->j_running_transaction &&
> + journal->j_running_transaction->t_tid == target) {
> /*
> * We want a new commit: OK, mark the request and wakeup the
> * commit thread. We do _not_ do the commit ourselves.
> @@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target)
> journal->j_commit_sequence);
> wake_up(&journal->j_wait_commit);
> return 1;
> - }
> + } else if (!tid_geq(journal->j_commit_request, target))
> + /* This should never happen, but if it does, preserve
> + the evidence before kjournald goes into a loop and
> + increments j_commit_sequence beyond all recognition. */
> + WARN(1, "jbd: bad log_start_commit: %u %u %u %u\n",
> + journal->j_commit_request, journal->j_commit_sequence,
> + target, journal->j_running_transaction ?
> + journal->j_running_transaction->t_tid : 0);
> return 0;
> }
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2011-05-02 15:08 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BCB84D936723884B91E4CC5CA0A7C54AA4F6D082BE@EMEADALEXMB1.corp.nai.org>
2011-04-25 23:14 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Ted Ts'o
2011-04-26 0:23 ` [PATCH 1/2] jbd2: fix fsync() tid wraparound bug Theodore Ts'o
2011-04-26 0:23 ` [PATCH 2/2] jbd: " Theodore Ts'o
2011-04-30 17:17 ` Ted Ts'o
2011-05-02 15:07 ` Jan Kara [this message]
2011-05-02 18:29 ` Ted Ts'o
2011-05-02 19:04 ` Jan Kara
2011-05-02 21:31 ` Ted Ts'o
2011-05-04 14:21 ` Martin_Zielinski
2011-05-04 21:55 ` Jan Kara
2011-05-05 14:11 ` Martin_Zielinski
2011-05-05 15:53 ` Jan Kara
2011-05-05 14:55 ` Martin_Zielinski
2011-05-05 15:43 ` Jan Kara
2011-04-26 9:07 ` 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c Martin_Zielinski
2011-04-26 12:23 ` Ted Ts'o
2011-04-26 12:45 ` Martin_Zielinski
2011-04-26 17:20 ` Ted Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110502150758.GH4556@quack.suse.cz \
--to=jack@suse.cz \
--cc=Martin_Zielinski@McAfee.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).