linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] ext3/jbd: possible filesystem corruption fixes (take 3)
@ 2008-07-24 12:34 Hidehiro Kawai
  2008-07-24 12:37 ` [PATCH 1/4] jbd: abort when failed to log metadata buffers Hidehiro Kawai
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-07-24 12:34 UTC (permalink / raw)
  To: akpm, sct
  Cc: linux-kernel, linux-ext4, jack, jbacik, cmm, tytso, adilger,
	snitzer, tglx, yumiko.sugita.yf, satoshi.oshima.fk

This patch set is the take 3 of fixing error handling problem in
ext3/JBD.  The previous discussion can be found here:
http://kerneltrap.org/mailarchive/linux-kernel/2008/6/2/2002094

Problem
=======
Currently some error checkings are missing, so the journal cannot abort
correctly.  This can cause a filesystem corruption.  Missing error
checkings are:

(1) error check for the metadata writes to the journal before the
    commit (addressed by PATCH 1/4)
(2) error check for checkpointing and replay (addressed by PATCH 2/4
    and 3/4)

PATCH 2/4 makes another problem worse; replaying old journaled
metadata can overwrite the latest metadata on the filesystem and
break its consistency.  This is fixed by PATCH 4/4.


Changes since Take 2
====================
[PATCH x/x]
o file data error handling fixes were separated to other patch set

[PATCH 3/4]
o fix return value handlings (trivial)

[PATCH 4/4]
o newly added to prevent unjournaled metadata buffers from being
  written to the filesystem on abort

Regards,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] jbd: abort when failed to log metadata buffers
  2008-07-24 12:34 [PATCH 0/4] ext3/jbd: possible filesystem corruption fixes (take 3) Hidehiro Kawai
@ 2008-07-24 12:37 ` Hidehiro Kawai
  2008-09-11 17:41   ` Eric Sandeen
  2008-07-24 12:38 ` [PATCH 2/4] jbd: fix error handling for checkpoint io Hidehiro Kawai
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Hidehiro Kawai @ 2008-07-24 12:37 UTC (permalink / raw)
  To: akpm, sct
  Cc: linux-kernel, linux-ext4, jack, jbacik, cmm, tytso, adilger,
	snitzer, tglx, yumiko.sugita.yf, satoshi.oshima.fk

If we failed to write metadata buffers to the journal space and
succeeded to write the commit record, stale data can be written
back to the filesystem as metadata in the recovery phase.

To avoid this, when we failed to write out metadata buffers,
abort the journal before writing the commit record.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/jbd/commit.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.26-rc8-mm1/fs/jbd/commit.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/fs/jbd/commit.c
+++ linux-2.6.26-rc8-mm1/fs/jbd/commit.c
@@ -762,6 +762,9 @@ wait_for_iobuf:
 		/* AKPM: bforget here */
 	}
 
+	if (err)
+		journal_abort(journal, err);
+
 	jbd_debug(3, "JBD: commit phase 6\n");
 
 	if (journal_write_commit_record(journal, commit_transaction))



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/4] jbd: fix error handling for checkpoint io
  2008-07-24 12:34 [PATCH 0/4] ext3/jbd: possible filesystem corruption fixes (take 3) Hidehiro Kawai
  2008-07-24 12:37 ` [PATCH 1/4] jbd: abort when failed to log metadata buffers Hidehiro Kawai
@ 2008-07-24 12:38 ` Hidehiro Kawai
  2008-07-24 12:40 ` [PATCH 3/4] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
  2008-07-24 12:41 ` [PATCH 4/4] jbd: don't dirty original metadata buffer on abort Hidehiro Kawai
  3 siblings, 0 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-07-24 12:38 UTC (permalink / raw)
  To: akpm, sct
  Cc: linux-kernel, linux-ext4, jack, jbacik, cmm, tytso, adilger,
	snitzer, tglx, yumiko.sugita.yf, satoshi.oshima.fk

When a checkpointing IO fails, current JBD code doesn't check the
error and continue journaling.  This means latest metadata can be
lost from both the journal and filesystem.

This patch leaves the failed metadata blocks in the journal space
and aborts journaling in the case of log_do_checkpoint().
To achieve this, we need to do:

1. don't remove the failed buffer from the checkpoint list where in
   the case of __try_to_free_cp_buf() because it may be released or
   overwritten by a later transaction
2. log_do_checkpoint() is the last chance, remove the failed buffer
   from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
   prevent the journaled contents from being cleaned.  For safety,
   don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext3 layer so
   that ext3 don't clear the needs_recovery flag, otherwise the
   journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent cleanup_journal_tail() from being called between
   __journal_drop_transaction() and journal_abort() (a race issue
   between journal_flush() and __log_wait_for_space()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/jbd/checkpoint.c |   49 +++++++++++++++++++++++++++++++-----------
 fs/jbd/journal.c    |   28 ++++++++++++++++++------
 fs/jbd/recovery.c   |    7 ++++--
 include/linux/jbd.h |    2 -
 4 files changed, 65 insertions(+), 21 deletions(-)

Index: linux-2.6.26-rc8-mm1/fs/jbd/checkpoint.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/fs/jbd/checkpoint.c
+++ linux-2.6.26-rc8-mm1/fs/jbd/checkpoint.c
@@ -93,7 +93,8 @@ static int __try_to_free_cp_buf(struct j
 	int ret = 0;
 	struct buffer_head *bh = jh2bh(jh);
 
-	if (jh->b_jlist == BJ_None && !buffer_locked(bh) && !buffer_dirty(bh)) {
+	if (jh->b_jlist == BJ_None && !buffer_locked(bh) &&
+	    !buffer_dirty(bh) && buffer_uptodate(bh)) {
 		JBUFFER_TRACE(jh, "remove from checkpoint list");
 		ret = __journal_remove_checkpoint(jh) + 1;
 		jbd_unlock_bh_state(bh);
@@ -160,21 +161,25 @@ static void jbd_sync_bh(journal_t *journ
  * buffers. Note that we take the buffers in the opposite ordering
  * from the one in which they were submitted for IO.
  *
+ * Return 0 on success, and return <0 if some buffers have failed
+ * to be written out.
+ *
  * Called with j_list_lock held.
  */
-static void __wait_cp_io(journal_t *journal, transaction_t *transaction)
+static int __wait_cp_io(journal_t *journal, transaction_t *transaction)
 {
 	struct journal_head *jh;
 	struct buffer_head *bh;
 	tid_t this_tid;
 	int released = 0;
+	int ret = 0;
 
 	this_tid = transaction->t_tid;
 restart:
 	/* Did somebody clean up the transaction in the meanwhile? */
 	if (journal->j_checkpoint_transactions != transaction ||
 			transaction->t_tid != this_tid)
-		return;
+		return ret;
 	while (!released && transaction->t_checkpoint_io_list) {
 		jh = transaction->t_checkpoint_io_list;
 		bh = jh2bh(jh);
@@ -194,6 +199,9 @@ restart:
 			spin_lock(&journal->j_list_lock);
 			goto restart;
 		}
+		if (unlikely(!buffer_uptodate(bh)))
+			ret = -EIO;
+
 		/*
 		 * Now in whatever state the buffer currently is, we know that
 		 * it has been written out and so we can drop it from the list
@@ -203,6 +211,8 @@ restart:
 		journal_remove_journal_head(bh);
 		__brelse(bh);
 	}
+
+	return ret;
 }
 
 #define NR_BATCH	64
@@ -226,7 +236,8 @@ __flush_batch(journal_t *journal, struct
  * Try to flush one buffer from the checkpoint list to disk.
  *
  * Return 1 if something happened which requires us to abort the current
- * scan of the checkpoint list.
+ * scan of the checkpoint list.  Return <0 if the buffer has failed to
+ * be written out.
  *
  * Called with j_list_lock held and drops it if 1 is returned
  * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
@@ -256,6 +267,9 @@ static int __process_buffer(journal_t *j
 		log_wait_commit(journal, tid);
 		ret = 1;
 	} else if (!buffer_dirty(bh)) {
+		ret = 1;
+		if (unlikely(!buffer_uptodate(bh)))
+			ret = -EIO;
 		J_ASSERT_JH(jh, !buffer_jbddirty(bh));
 		BUFFER_TRACE(bh, "remove from checkpoint");
 		__journal_remove_checkpoint(jh);
@@ -263,7 +277,6 @@ static int __process_buffer(journal_t *j
 		jbd_unlock_bh_state(bh);
 		journal_remove_journal_head(bh);
 		__brelse(bh);
-		ret = 1;
 	} else {
 		/*
 		 * Important: we are about to write the buffer, and
@@ -295,6 +308,7 @@ static int __process_buffer(journal_t *j
  * to disk. We submit larger chunks of data at once.
  *
  * The journal should be locked before calling this function.
+ * Called with j_checkpoint_mutex held.
  */
 int log_do_checkpoint(journal_t *journal)
 {
@@ -318,6 +332,7 @@ int log_do_checkpoint(journal_t *journal
 	 * OK, we need to start writing disk blocks.  Take one transaction
 	 * and write it.
 	 */
+	result = 0;
 	spin_lock(&journal->j_list_lock);
 	if (!journal->j_checkpoint_transactions)
 		goto out;
@@ -334,7 +349,7 @@ restart:
 		int batch_count = 0;
 		struct buffer_head *bhs[NR_BATCH];
 		struct journal_head *jh;
-		int retry = 0;
+		int retry = 0, err;
 
 		while (!retry && transaction->t_checkpoint_list) {
 			struct buffer_head *bh;
@@ -347,6 +362,8 @@ restart:
 				break;
 			}
 			retry = __process_buffer(journal, jh, bhs,&batch_count);
+			if (retry < 0 && !result)
+				result = retry;
 			if (!retry && (need_resched() ||
 				spin_needbreak(&journal->j_list_lock))) {
 				spin_unlock(&journal->j_list_lock);
@@ -371,14 +388,18 @@ restart:
 		 * Now we have cleaned up the first transaction's checkpoint
 		 * list. Let's clean up the second one
 		 */
-		__wait_cp_io(journal, transaction);
+		err = __wait_cp_io(journal, transaction);
+		if (!result)
+			result = err;
 	}
 out:
 	spin_unlock(&journal->j_list_lock);
-	result = cleanup_journal_tail(journal);
 	if (result < 0)
-		return result;
-	return 0;
+		journal_abort(journal, result);
+	else
+		result = cleanup_journal_tail(journal);
+
+	return (result < 0) ? result : 0;
 }
 
 /*
@@ -394,8 +415,9 @@ out:
  * This is the only part of the journaling code which really needs to be
  * aware of transaction aborts.  Checkpointing involves writing to the
  * main filesystem area rather than to the journal, so it can proceed
- * even in abort state, but we must not update the journal superblock if
- * we have an abort error outstanding.
+ * even in abort state, but we must not update the super block if
+ * checkpointing may have failed.  Otherwise, we would lose some metadata
+ * buffers which should be written-back to the filesystem.
  */
 
 int cleanup_journal_tail(journal_t *journal)
@@ -404,6 +426,9 @@ int cleanup_journal_tail(journal_t *jour
 	tid_t		first_tid;
 	unsigned long	blocknr, freed;
 
+	if (is_journal_aborted(journal))
+		return 1;
+
 	/* OK, work out the oldest transaction remaining in the log, and
 	 * the log block it starts at.
 	 *
Index: linux-2.6.26-rc8-mm1/fs/jbd/journal.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/fs/jbd/journal.c
+++ linux-2.6.26-rc8-mm1/fs/jbd/journal.c
@@ -1121,9 +1121,12 @@ recovery_error:
  *
  * Release a journal_t structure once it is no longer in use by the
  * journaled object.
+ * Return <0 if we couldn't clean up the journal.
  */
-void journal_destroy(journal_t *journal)
+int journal_destroy(journal_t *journal)
 {
+	int err = 0;
+
 	/* Wait for the commit thread to wake up and die. */
 	journal_kill_thread(journal);
 
@@ -1146,11 +1149,16 @@ void journal_destroy(journal_t *journal)
 	J_ASSERT(journal->j_checkpoint_transactions == NULL);
 	spin_unlock(&journal->j_list_lock);
 
-	/* We can now mark the journal as empty. */
-	journal->j_tail = 0;
-	journal->j_tail_sequence = ++journal->j_transaction_sequence;
 	if (journal->j_sb_buffer) {
-		journal_update_superblock(journal, 1);
+		if (!is_journal_aborted(journal)) {
+			/* We can now mark the journal as empty. */
+			journal->j_tail = 0;
+			journal->j_tail_sequence =
+				++journal->j_transaction_sequence;
+			journal_update_superblock(journal, 1);
+		} else {
+			err = -EIO;
+		}
 		brelse(journal->j_sb_buffer);
 	}
 
@@ -1160,6 +1168,8 @@ void journal_destroy(journal_t *journal)
 		journal_destroy_revoke(journal);
 	kfree(journal->j_wbuf);
 	kfree(journal);
+
+	return err;
 }
 
 
@@ -1359,10 +1369,16 @@ int journal_flush(journal_t *journal)
 	spin_lock(&journal->j_list_lock);
 	while (!err && journal->j_checkpoint_transactions != NULL) {
 		spin_unlock(&journal->j_list_lock);
+		mutex_lock(&journal->j_checkpoint_mutex);
 		err = log_do_checkpoint(journal);
+		mutex_unlock(&journal->j_checkpoint_mutex);
 		spin_lock(&journal->j_list_lock);
 	}
 	spin_unlock(&journal->j_list_lock);
+
+	if (is_journal_aborted(journal))
+		return -EIO;
+
 	cleanup_journal_tail(journal);
 
 	/* Finally, mark the journal as really needing no recovery.
@@ -1384,7 +1400,7 @@ int journal_flush(journal_t *journal)
 	J_ASSERT(journal->j_head == journal->j_tail);
 	J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence);
 	spin_unlock(&journal->j_state_lock);
-	return err;
+	return 0;
 }
 
 /**
Index: linux-2.6.26-rc8-mm1/fs/jbd/recovery.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/fs/jbd/recovery.c
+++ linux-2.6.26-rc8-mm1/fs/jbd/recovery.c
@@ -223,7 +223,7 @@ do {									\
  */
 int journal_recover(journal_t *journal)
 {
-	int			err;
+	int			err, err2;
 	journal_superblock_t *	sb;
 
 	struct recovery_info	info;
@@ -261,7 +261,10 @@ int journal_recover(journal_t *journal)
 	journal->j_transaction_sequence = ++info.end_transaction;
 
 	journal_clear_revoke(journal);
-	sync_blockdev(journal->j_fs_dev);
+	err2 = sync_blockdev(journal->j_fs_dev);
+	if (!err)
+		err = err2;
+
 	return err;
 }
 
Index: linux-2.6.26-rc8-mm1/include/linux/jbd.h
===================================================================
--- linux-2.6.26-rc8-mm1.orig/include/linux/jbd.h
+++ linux-2.6.26-rc8-mm1/include/linux/jbd.h
@@ -908,7 +908,7 @@ extern int	   journal_set_features
 		   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int	   journal_create     (journal_t *);
 extern int	   journal_load       (journal_t *journal);
-extern void	   journal_destroy    (journal_t *);
+extern int	   journal_destroy    (journal_t *);
 extern int	   journal_recover    (journal_t *journal);
 extern int	   journal_wipe       (journal_t *, int);
 extern int	   journal_skip_recovery	(journal_t *);



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/4] ext3: abort ext3 if the journal has aborted
  2008-07-24 12:34 [PATCH 0/4] ext3/jbd: possible filesystem corruption fixes (take 3) Hidehiro Kawai
  2008-07-24 12:37 ` [PATCH 1/4] jbd: abort when failed to log metadata buffers Hidehiro Kawai
  2008-07-24 12:38 ` [PATCH 2/4] jbd: fix error handling for checkpoint io Hidehiro Kawai
@ 2008-07-24 12:40 ` Hidehiro Kawai
  2008-07-28  6:26   ` Hidehiro Kawai
  2008-07-24 12:41 ` [PATCH 4/4] jbd: don't dirty original metadata buffer on abort Hidehiro Kawai
  3 siblings, 1 reply; 8+ messages in thread
From: Hidehiro Kawai @ 2008-07-24 12:40 UTC (permalink / raw)
  To: akpm, sct, adilger
  Cc: linux-kernel, linux-ext4, jack, jbacik, cmm, tytso, snitzer, tglx,
	yumiko.sugita.yf, satoshi.oshima.fk

If the journal has aborted due to a checkpointing failure, we
have to keep the contents of the journal space.  ext3_put_super()
detects the journal abort, then it invokes ext3_abort() to make
the filesystem read only and keep needs_recovery flag.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/ioctl.c |   12 ++++++++----
 fs/ext3/super.c |   11 +++++++++--
 2 files changed, 17 insertions(+), 6 deletions(-)

Index: linux-2.6.26-rc8-mm1/fs/ext3/ioctl.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/fs/ext3/ioctl.c
+++ linux-2.6.26-rc8-mm1/fs/ext3/ioctl.c
@@ -239,7 +239,7 @@ setrsvsz_out:
 	case EXT3_IOC_GROUP_EXTEND: {
 		ext3_fsblk_t n_blocks_count;
 		struct super_block *sb = inode->i_sb;
-		int err;
+		int err, err2;
 
 		if (!capable(CAP_SYS_RESOURCE))
 			return -EPERM;
@@ -254,8 +254,10 @@ setrsvsz_out:
 		}
 		err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
 		journal_lock_updates(EXT3_SB(sb)->s_journal);
-		journal_flush(EXT3_SB(sb)->s_journal);
+		err2 = journal_flush(EXT3_SB(sb)->s_journal);
 		journal_unlock_updates(EXT3_SB(sb)->s_journal);
+		if (err == 0)
+			err = err2;
 group_extend_out:
 		mnt_drop_write(filp->f_path.mnt);
 		return err;
@@ -263,7 +265,7 @@ group_extend_out:
 	case EXT3_IOC_GROUP_ADD: {
 		struct ext3_new_group_data input;
 		struct super_block *sb = inode->i_sb;
-		int err;
+		int err, err2;
 
 		if (!capable(CAP_SYS_RESOURCE))
 			return -EPERM;
@@ -280,8 +282,10 @@ group_extend_out:
 
 		err = ext3_group_add(sb, &input);
 		journal_lock_updates(EXT3_SB(sb)->s_journal);
-		journal_flush(EXT3_SB(sb)->s_journal);
+		err2 = journal_flush(EXT3_SB(sb)->s_journal);
 		journal_unlock_updates(EXT3_SB(sb)->s_journal);
+		if (err == 0)
+			err = err2;
 group_add_out:
 		mnt_drop_write(filp->f_path.mnt);
 		return err;
Index: linux-2.6.26-rc8-mm1/fs/ext3/super.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/fs/ext3/super.c
+++ linux-2.6.26-rc8-mm1/fs/ext3/super.c
@@ -393,7 +393,8 @@ static void ext3_put_super (struct super
 	int i;
 
 	ext3_xattr_put_super(sb);
-	journal_destroy(sbi->s_journal);
+	if (journal_destroy(sbi->s_journal) < 0)
+		ext3_abort(sb, __func__, "Couldn't clean up the journal");
 	if (!(sb->s_flags & MS_RDONLY)) {
 		EXT3_CLEAR_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
 		es->s_state = cpu_to_le16(sbi->s_mount_state);
@@ -2388,7 +2389,13 @@ static void ext3_write_super_lockfs(stru
 
 		/* Now we set up the journal barrier. */
 		journal_lock_updates(journal);
-		journal_flush(journal);
+
+		/*
+		 * We don't want to clear needs_recovery flag when we failed
+		 * to flush the journal.
+		 */
+		if (journal_flush(journal) < 0)
+			return;
 
 		/* Journal blocked and flushed, clear needs_recovery flag. */
 		EXT3_CLEAR_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 4/4] jbd: don't dirty original metadata buffer on abort
  2008-07-24 12:34 [PATCH 0/4] ext3/jbd: possible filesystem corruption fixes (take 3) Hidehiro Kawai
                   ` (2 preceding siblings ...)
  2008-07-24 12:40 ` [PATCH 3/4] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
@ 2008-07-24 12:41 ` Hidehiro Kawai
  3 siblings, 0 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-07-24 12:41 UTC (permalink / raw)
  To: akpm, sct
  Cc: linux-kernel, linux-ext4, jack, jbacik, cmm, tytso, adilger,
	snitzer, tglx, yumiko.sugita.yf, satoshi.oshima.fk

Currently, original metadata buffers are dirtied when they are
unfiled whether the journal has aborted or not.  Eventually these
buffers will be written-back to the filesystem by pdflush.  This
means some metadata buffers are written to the filesystem without
journaling if the journal aborts.  So if both journal abort and
system crash happen at the same time, the filesystem would become
inconsistent state.  Additionally, replaying journaled metadata
can overwrite the latest metadata on the filesystem partly.
Because, if the journal aborts, journaled metadata are preserved
and replayed during the next mount not to lose uncheckpointed
metadata.  This would also break the consistency of the filesystem.

This patch prevents original metadata buffers from being dirtied
on abort by clearing BH_JBDDirty flag from those buffers.  Thus,
no metadata buffers are written to the filesystem without journaling.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 fs/jbd/commit.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6.26-rc8-mm1/fs/jbd/commit.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/fs/jbd/commit.c
+++ linux-2.6.26-rc8-mm1/fs/jbd/commit.c
@@ -518,9 +518,10 @@ void journal_commit_transaction(journal_
 		jh = commit_transaction->t_buffers;
 
 		/* If we're in abort mode, we just un-journal the buffer and
-		   release it for background writing. */
+		   release it. */
 
 		if (is_journal_aborted(journal)) {
+			clear_buffer_jbddirty(jh2bh(jh));
 			JBUFFER_TRACE(jh, "journal is aborting: refile");
 			journal_refile_buffer(journal, jh);
 			/* If that was the last one, we need to clean up
@@ -855,6 +856,8 @@ restart_loop:
 		if (buffer_jbddirty(bh)) {
 			JBUFFER_TRACE(jh, "add to new checkpointing trans");
 			__journal_insert_checkpoint(jh, commit_transaction);
+			if (is_journal_aborted(journal))
+				clear_buffer_jbddirty(bh);
 			JBUFFER_TRACE(jh, "refile for checkpoint writeback");
 			__journal_refile_buffer(jh);
 			jbd_unlock_bh_state(bh);



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/4] ext3: abort ext3 if the journal has aborted
  2008-07-24 12:40 ` [PATCH 3/4] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
@ 2008-07-28  6:26   ` Hidehiro Kawai
  0 siblings, 0 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-07-28  6:26 UTC (permalink / raw)
  To: akpm
  Cc: sct, adilger, linux-kernel, linux-ext4, jack, jbacik, cmm, tytso,
	snitzer, tglx, yumiko.sugita.yf, satoshi.oshima.fk

> 4. when checkpointing fails, notify this error to the ext3 layer so
>    that ext3 don't clear the needs_recovery flag, otherwise the
>    journaled contents are ignored and cleaned in the recovery phase

Mike Snitzer noticed that ext3_mark_recovery_complete() doesn't
check checkpointing failure and it clears needs_recovery flag 
(thanks, Mike!).  I need an additional fix.

I also found ext3_quota_on() forces checkpointing by journal_flush()
but it doesn't check the error.  This will appear in 2.6.27-rc1,
so I'll send the revised patch set for 2.6.27-rc1 (it may be
a separate patch).

Thanks,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/4] jbd: abort when failed to log metadata buffers
  2008-07-24 12:37 ` [PATCH 1/4] jbd: abort when failed to log metadata buffers Hidehiro Kawai
@ 2008-09-11 17:41   ` Eric Sandeen
  2008-09-11 18:32     ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2008-09-11 17:41 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: akpm, sct, linux-kernel, linux-ext4, jack, jbacik, cmm, tytso,
	adilger, snitzer, tglx, yumiko.sugita.yf, satoshi.oshima.fk

Hidehiro Kawai wrote:
> If we failed to write metadata buffers to the journal space and
> succeeded to write the commit record, stale data can be written
> back to the filesystem as metadata in the recovery phase.
> 
> To avoid this, when we failed to write out metadata buffers,
> abort the journal before writing the commit record.
> 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Acked-by: Jan Kara <jack@suse.cz>
> ---
>  fs/jbd/commit.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6.26-rc8-mm1/fs/jbd/commit.c
> ===================================================================
> --- linux-2.6.26-rc8-mm1.orig/fs/jbd/commit.c
> +++ linux-2.6.26-rc8-mm1/fs/jbd/commit.c
> @@ -762,6 +762,9 @@ wait_for_iobuf:
>  		/* AKPM: bforget here */
>  	}
>  
> +	if (err)
> +		journal_abort(journal, err);
> +
>  	jbd_debug(3, "JBD: commit phase 6\n");
>  
>  	if (journal_write_commit_record(journal, commit_transaction))

A little late for me to chime in, but why is this a journal_abort() when
journal aborts before and after this call __journal_abort_hard()?

I'm just not sure what's different about this one condition that we wish
to update the error and the superblock here?

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/4] jbd: abort when failed to log metadata buffers
  2008-09-11 17:41   ` Eric Sandeen
@ 2008-09-11 18:32     ` Eric Sandeen
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2008-09-11 18:32 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: akpm, sct, linux-kernel, linux-ext4, jack, jbacik, cmm, tytso,
	adilger, snitzer, tglx, yumiko.sugita.yf, satoshi.oshima.fk

Eric Sandeen wrote:
> Hidehiro Kawai wrote:
>> If we failed to write metadata buffers to the journal space and
>> succeeded to write the commit record, stale data can be written
>> back to the filesystem as metadata in the recovery phase.
>>
>> To avoid this, when we failed to write out metadata buffers,
>> abort the journal before writing the commit record.
>>
>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
>> Acked-by: Jan Kara <jack@suse.cz>
>> ---
>>  fs/jbd/commit.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> Index: linux-2.6.26-rc8-mm1/fs/jbd/commit.c
>> ===================================================================
>> --- linux-2.6.26-rc8-mm1.orig/fs/jbd/commit.c
>> +++ linux-2.6.26-rc8-mm1/fs/jbd/commit.c
>> @@ -762,6 +762,9 @@ wait_for_iobuf:
>>  		/* AKPM: bforget here */
>>  	}
>>  
>> +	if (err)
>> +		journal_abort(journal, err);
>> +
>>  	jbd_debug(3, "JBD: commit phase 6\n");
>>  
>>  	if (journal_write_commit_record(journal, commit_transaction))
> 
> A little late for me to chime in, but why is this a journal_abort() when
> journal aborts before and after this call __journal_abort_hard()?
> 
> I'm just not sure what's different about this one condition that we wish
> to update the error and the superblock here?


I take that back, was looking at an old tree, and Josef pointed out
commit 7a266e75cf5a1efd20d084408a1b7f1a185496dd to me that changes this,
and now it makes much more sense.  :)

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7a266e75cf5a1efd20d084408a1b7f1a185496dd

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-09-11 18:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-24 12:34 [PATCH 0/4] ext3/jbd: possible filesystem corruption fixes (take 3) Hidehiro Kawai
2008-07-24 12:37 ` [PATCH 1/4] jbd: abort when failed to log metadata buffers Hidehiro Kawai
2008-09-11 17:41   ` Eric Sandeen
2008-09-11 18:32     ` Eric Sandeen
2008-07-24 12:38 ` [PATCH 2/4] jbd: fix error handling for checkpoint io Hidehiro Kawai
2008-07-24 12:40 ` [PATCH 3/4] ext3: abort ext3 if the journal has aborted Hidehiro Kawai
2008-07-28  6:26   ` Hidehiro Kawai
2008-07-24 12:41 ` [PATCH 4/4] jbd: don't dirty original metadata buffer on abort Hidehiro Kawai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).