From: Theodore Ts'o <tytso@mit.edu>
To: Ben Hutchings <ben@decadent.org.uk>
Cc: George Barnett <gbarnett@atlassian.com>,
linux-ext4@vger.kernel.org,
Debian kernel maintainers <debian-kernel@lists.debian.org>
Subject: [PATCH] ext4/jbd2: don't wait (forever) for stale tid caused by wraparound
Date: Sun, 17 Mar 2013 22:54:01 -0400 [thread overview]
Message-ID: <20130318025401.GA12611@thunk.org> (raw)
In-Reply-To: <1363412062.3937.196.camel@deadeye.wl.decadent.org.uk>
On Sat, Mar 16, 2013 at 05:34:22AM +0000, Ben Hutchings wrote:
> > We use debian for a number of machines in our storage infrastructure
> > and we have recently been seeing a number of "hangs". We primary
> > notice this by seeing nfsd processes locking up and then a hung task
> > killer going wild. We finally managed to get a trace last night - its
> > pasted below:
Thanks for reporting this. I thought we had fixed this in 3.0.
Before then, when we had a tid wrap, it would result in kjournald
spinning forever. I suspect this was your "spontaneous reboots" that
you mentioned you mentioned when you were using 2.6.39 --- did you
have a hardware or softward watchdog timer enabled by any chance?
Since we didn't have a good way of reproducing the problem at the
time, I didn't realize that the problem had not been fully fixed;
since while jbd2_log_start_commit() would no longer cause kjournald to
spin forwever, a subsequent call to jbd2_log_wait_commit() with a
stale transaction id would wait for a very long time (possibly until
the heat death of the universe :-)
I think a patch like this should fix things; I've run a stress test
with a hack to increment the transaction id by 1 << 24 after each
commit, to more quickly cause an tid wrap, and the regression tests
seem to be passing without complaint.
- Ted
>From 76b05344fef573701b22ced444223188f054f94d Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Sun, 17 Mar 2013 22:24:46 -0400
Subject: [PATCH] ext4/jbd2: don't wait (forever) for stale tid caused by
wraparound
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
Cc: stable@vger.kernel.org
---
fs/ext4/fsync.c | 3 +--
fs/ext4/inode.c | 3 +--
fs/jbd2/journal.c | 31 +++++++++++++++++++++++++++++++
include/linux/jbd2.h | 1 +
4 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 3278e64..e0ba8a4 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -166,8 +166,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
if (journal->j_flags & JBD2_BARRIER &&
!jbd2_trans_will_send_data_barrier(journal, commit_tid))
needs_barrier = true;
- jbd2_log_start_commit(journal, commit_tid);
- ret = jbd2_log_wait_commit(journal, commit_tid);
+ ret = jbd2_complete_transaction(journal, commit_tid);
if (needs_barrier) {
err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
if (!ret)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b6fab7c..de4b58d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -211,8 +211,7 @@ void ext4_evict_inode(struct inode *inode)
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
tid_t commit_tid = EXT4_I(inode)->i_datasync_tid;
- jbd2_log_start_commit(journal, commit_tid);
- jbd2_log_wait_commit(journal, commit_tid);
+ jbd2_complete_transaction(journal, commit_tid);
filemap_write_and_wait(&inode->i_data);
}
truncate_inode_pages(&inode->i_data, 0);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index ed10991..886ec2f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -710,6 +710,37 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
}
/*
+ * When this function returns the transaction corresponding to tid
+ * will be completed. If the transaction has currently running, start
+ * committing that transaction before waiting for it to complete. If
+ * the transaction id is stale, it is by definition already completed,
+ * so just return SUCCESS.
+ */
+int jbd2_complete_transaction(journal_t *journal, tid_t tid)
+{
+ int need_to_wait = 1;
+
+ read_lock(&journal->j_state_lock);
+ if (journal->j_running_transaction &&
+ journal->j_running_transaction->t_tid == tid) {
+ if (journal->j_commit_request != tid) {
+ /* transaction not yet started, so request it */
+ read_unlock(&journal->j_state_lock);
+ jbd2_log_start_commit(journal, tid);
+ goto wait_commit;
+ }
+ } else if (!(journal->j_committing_transaction &&
+ journal->j_committing_transaction->t_tid == tid))
+ need_to_wait = 0;
+ read_unlock(&journal->j_state_lock);
+ if (!need_to_wait)
+ return 0;
+wait_commit:
+ return jbd2_log_wait_commit(journal, tid);
+}
+EXPORT_SYMBOL(jbd2_complete_transaction);
+
+/*
* Log buffer allocation routines:
*/
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 50e5a5e..f028975 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1200,6 +1200,7 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
int jbd2_journal_force_commit_nested(journal_t *journal);
int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
+int jbd2_complete_transaction(journal_t *journal, tid_t tid);
int jbd2_log_do_checkpoint(journal_t *journal);
int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
--
1.7.12.rc0.22.gcdd159b
next prev parent reply other threads:[~2013-03-18 2:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <B2EC601CDDA242189A46599B31EA6AD3@atlassian.com>
2013-03-16 5:34 ` jbd2 tid wrap seen on NFS server Ben Hutchings
2013-03-18 2:54 ` Theodore Ts'o [this message]
2013-03-18 4:24 ` [PATCH] ext4/jbd2: don't wait (forever) for stale tid caused by wraparound George Barnett
2013-03-18 4:53 ` Ben Hutchings
2013-03-18 14:31 ` Theodore Ts'o
2013-03-18 14:34 ` Theodore Ts'o
2013-03-21 20:46 ` Jan Kara
2013-03-21 21:09 ` Theodore Ts'o
2013-03-21 22:41 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130318025401.GA12611@thunk.org \
--to=tytso@mit.edu \
--cc=ben@decadent.org.uk \
--cc=debian-kernel@lists.debian.org \
--cc=gbarnett@atlassian.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.