[PATCH 3.4 32/65] jbd2: fix ocfs2 corrupt when updating journal superblock fails

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: lizf@kernel.org
To: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Joseph Qi <joseph.qi@huawei.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Junxiao Bi <junxiao.bi@oracle.com>, Zefan Li <lizefan@huawei.com>
Subject: [PATCH 3.4 32/65] jbd2: fix ocfs2 corrupt when updating journal superblock fails
Date: Tue, 20 Oct 2015 08:47:42 +0800	[thread overview]
Message-ID: <1445302095-4695-32-git-send-email-lizf@kernel.org> (raw)
In-Reply-To: <1445302030-4607-1-git-send-email-lizf@kernel.org>

From: Joseph Qi <joseph.qi@huawei.com>

3.4.110-rc1 review patch.  If anyone has any objections, please let me know.

------------------


commit 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a upstream.

If updating journal superblock fails after journal data has been
flushed, the error is omitted and this will mislead the caller as a
normal case.  In ocfs2, the checkpoint will be treated successfully
and the other node can get the lock to update. Since the sb_start is
still pointing to the old log block, it will rewrite the journal data
during journal recovery by the other node. Thus the new updates will
be overwritten and ocfs2 corrupts.  So in above case we have to return
the error, and ocfs2_commit_cache will take care of the error and
prevent the other node to do update first.  And only after recovering
journal it can do the new updates.

The issue discussion mail can be found at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html
http://comments.gmane.org/gmane.comp.file-systems.ext4/48841

[ Fixed bug in patch which allowed a non-negative error return from
  jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
  was causing xfstests ext4/306 to fail. -- Ted ]

Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
---
 fs/jbd2/checkpoint.c |  5 ++---
 fs/jbd2/journal.c    | 38 +++++++++++++++++++++++++++++++-------
 include/linux/jbd2.h |  4 ++--
 3 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index dadfedb..6bb5285 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -440,7 +440,7 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
 	unsigned long	blocknr;
 
 	if (is_journal_aborted(journal))
-		return 1;
+		return -EIO;
 
 	if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
 		return 1;
@@ -457,8 +457,7 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
 	if (journal->j_flags & JBD2_BARRIER)
 		blkdev_issue_flush(journal->j_fs_dev, GFP_NOFS, NULL);
 
-	__jbd2_update_log_tail(journal, first_tid, blocknr);
-	return 0;
+	return __jbd2_update_log_tail(journal, first_tid, blocknr);
 }
 
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index f697468..ad64b94 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -823,9 +823,10 @@ int jbd2_journal_get_log_tail(journal_t *journal, tid_t *tid,
  *
  * Requires j_checkpoint_mutex
  */
-void __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
+int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
 {
 	unsigned long freed;
+	int ret;
 
 	BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
 
@@ -835,7 +836,10 @@ void __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
 	 * space and if we lose sb update during power failure we'd replay
 	 * old transaction with possibly newly overwritten data.
 	 */
-	jbd2_journal_update_sb_log_tail(journal, tid, block, WRITE_FUA);
+	ret = jbd2_journal_update_sb_log_tail(journal, tid, block, WRITE_FUA);
+	if (ret)
+		goto out;
+
 	write_lock(&journal->j_state_lock);
 	freed = block - journal->j_tail;
 	if (block < journal->j_tail)
@@ -851,6 +855,9 @@ void __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
 	journal->j_tail_sequence = tid;
 	journal->j_tail = block;
 	write_unlock(&journal->j_state_lock);
+
+out:
+	return ret;
 }
 
 /*
@@ -1264,7 +1271,7 @@ static int journal_reset(journal_t *journal)
 	return jbd2_journal_start_thread(journal);
 }
 
-static void jbd2_write_superblock(journal_t *journal, int write_op)
+static int jbd2_write_superblock(journal_t *journal, int write_op)
 {
 	struct buffer_head *bh = journal->j_sb_buffer;
 	int ret;
@@ -1301,7 +1308,10 @@ static void jbd2_write_superblock(journal_t *journal, int write_op)
 		printk(KERN_ERR "JBD2: Error %d detected when updating "
 		       "journal superblock for %s.\n", ret,
 		       journal->j_devname);
+		jbd2_journal_abort(journal, ret);
 	}
+
+	return ret;
 }
 
 /**
@@ -1314,10 +1324,11 @@ static void jbd2_write_superblock(journal_t *journal, int write_op)
  * Update a journal's superblock information about log tail and write it to
  * disk, waiting for the IO to complete.
  */
-void jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
+int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
 				     unsigned long tail_block, int write_op)
 {
 	journal_superblock_t *sb = journal->j_superblock;
+	int ret;
 
 	BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
 	jbd_debug(1, "JBD2: updating superblock (start %lu, seq %u)\n",
@@ -1326,13 +1337,18 @@ void jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
 	sb->s_sequence = cpu_to_be32(tail_tid);
 	sb->s_start    = cpu_to_be32(tail_block);
 
-	jbd2_write_superblock(journal, write_op);
+	ret = jbd2_write_superblock(journal, write_op);
+	if (ret)
+		goto out;
 
 	/* Log is no longer empty */
 	write_lock(&journal->j_state_lock);
 	WARN_ON(!sb->s_sequence);
 	journal->j_flags &= ~JBD2_FLUSHED;
 	write_unlock(&journal->j_state_lock);
+
+out:
+	return ret;
 }
 
 /**
@@ -1785,7 +1801,14 @@ int jbd2_journal_flush(journal_t *journal)
 		return -EIO;
 
 	mutex_lock(&journal->j_checkpoint_mutex);
-	jbd2_cleanup_journal_tail(journal);
+	if (!err) {
+		err = jbd2_cleanup_journal_tail(journal);
+		if (err < 0) {
+			mutex_unlock(&journal->j_checkpoint_mutex);
+			goto out;
+		}
+		err = 0;
+	}
 
 	/* Finally, mark the journal as really needing no recovery.
 	 * This sets s_start==0 in the underlying superblock, which is
@@ -1801,7 +1824,8 @@ int jbd2_journal_flush(journal_t *journal)
 	J_ASSERT(journal->j_head == journal->j_tail);
 	J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence);
 	write_unlock(&journal->j_state_lock);
-	return 0;
+out:
+	return err;
 }
 
 /**
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2ffbf99..129bca4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -974,7 +974,7 @@ extern struct journal_head * jbd2_journal_get_descriptor_buffer(journal_t *);
 int jbd2_journal_next_log_block(journal_t *, unsigned long long *);
 int jbd2_journal_get_log_tail(journal_t *journal, tid_t *tid,
 			      unsigned long *block);
-void __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block);
+int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block);
 void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block);
 
 /* Commit management */
@@ -1093,7 +1093,7 @@ extern int	   jbd2_journal_recover    (journal_t *journal);
 extern int	   jbd2_journal_wipe       (journal_t *, int);
 extern int	   jbd2_journal_skip_recovery	(journal_t *);
 extern void	   jbd2_journal_update_sb_errno(journal_t *);
-extern void	   jbd2_journal_update_sb_log_tail	(journal_t *, tid_t,
+extern int	   jbd2_journal_update_sb_log_tail	(journal_t *, tid_t,
 				unsigned long, int);
 extern void	   __jbd2_journal_abort_hard	(journal_t *);
 extern void	   jbd2_journal_abort      (journal_t *, int);
-- 
1.9.1

next prev parent reply	other threads:[~2015-10-20  0:50 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-20  0:47 [PATCH 3.4 00/65] 3.4.110-rc1 review lizf
2015-10-20  0:47 ` [PATCH 3.4 01/65] hrtimer: Allow concurrent hrtimer_start() for self restarting timers lizf
2015-10-20  0:47 ` [PATCH 3.4 02/65] mtd: fix: avoid race condition when accessing mtd->usecount lizf
2015-10-20  0:47 ` [PATCH 3.4 03/65] crypto: talitos - avoid memleak in talitos_alg_alloc() lizf
2015-10-20  0:47 ` [PATCH 3.4 04/65] ASoC: wm8737: Fixup setting VMID Impedance control register lizf
2015-10-20  0:47 ` [PATCH 3.4 05/65] ASoC: wm8903: Fix define for WM8903_VMID_RES_250K lizf
2015-10-20  0:47 ` [PATCH 3.4 06/65] ASoC: wm8955: Fix setting wrong register for WM8955_K_8_0_MASK bits lizf
2015-10-20  0:47 ` [PATCH 3.4 07/65] pktgen: adjust spacing in proc file interface output lizf
2015-10-20  0:47 ` [PATCH 3.4 08/65] pktgen: document ability to add same device to several threads lizf
2015-10-20  0:47 ` [PATCH 3.4 09/65] tty/serial: at91: RS485 mode: 0 is valid for delay_rts_after_send lizf
2015-10-20  0:47 ` [PATCH 3.4 10/65] rndis_wlan: harmless issue calling set_bit() lizf
2015-10-20  0:47 ` [PATCH 3.4 11/65] drm/radeon: take the mode_config mutex when dealing with hpds (v2) lizf
2015-10-20  0:47 ` [PATCH 3.4 12/65] usb: dwc3: gadget: return error if command sent to DEPCMD register fails lizf
2015-10-20  0:47 ` [PATCH 3.4 13/65] rcu: Correctly handle non-empty Tiny RCU callback list with none ready lizf
2015-10-20  0:47 ` [PATCH 3.4 14/65] mtd: dc21285: use raw spinlock functions for nw_gpio_lock lizf
2015-10-20  0:47 ` [PATCH 3.4 15/65] staging: rtl8712: prevent buffer overrun in recvbuf2recvframe lizf
2015-10-20  0:47 ` [PATCH 3.4 16/65] usb: core: Fix USB 3.0 devices lost in NOTATTACHED state after a hub port reset lizf
2015-10-20  0:47 ` [PATCH 3.4 17/65] fixing infinite OPEN loop in 4.0 stateid recovery lizf
     [not found]   ` <3B7DC48D-0D7F-4F9A-9CE0-FAC640F60199@netapp.com>
2015-10-21  8:04     ` Zefan Li
2015-10-20  0:47 ` [PATCH 3.4 18/65] NFS: Fix size of NFSACL SETACL operations lizf
2015-10-20  0:47 ` [PATCH 3.4 19/65] SUNRPC: Fix a memory leak in the backchannel code lizf
2015-10-20  0:47 ` [PATCH 3.4 20/65] ipr: Increase default adapter init stage change timeout lizf
2015-10-20  0:47 ` [PATCH 3.4 21/65] ath3k: add support of 13d3:3474 AR3012 device lizf
2015-10-20  0:47 ` [PATCH 3.4 22/65] ath9k: fix DMA stop sequence for AR9003+ lizf
2015-10-20  0:47 ` [PATCH 3.4 23/65] regulator: core: fix constraints output buffer lizf
2015-10-20  0:47 ` [PATCH 3.4 24/65] x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A lizf
2015-10-20  0:47 ` [PATCH 3.4 25/65] dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup lizf
2015-10-20  0:47 ` [PATCH 3.4 26/65] ASoC: wm8960: the enum of "DAC Polarity" should be wm8960_enum[1] lizf
2015-10-20  0:47 ` [PATCH 3.4 27/65] ext4: fix race between truncate and __ext4_journalled_writepage() lizf
2015-10-20  0:47 ` [PATCH 3.4 28/65] Disable write buffering on Toshiba ToPIC95 lizf
2015-10-20  0:47 ` [PATCH 3.4 29/65] sctp: fix ASCONF list handling lizf
2015-10-20  0:47 ` [PATCH 3.4 30/65] jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail() lizf
2015-10-20  0:47 ` [PATCH 3.4 31/65] regmap: Fix regmap_bulk_read in BE mode lizf
2015-10-20  0:47 ` lizf [this message]
2015-10-20  0:47 ` [PATCH 3.4 33/65] ideapad: fix software rfkill setting lizf
2015-10-20  0:47 ` [PATCH 3.4 34/65] mmc: card: Fixup request missing in mmc_blk_issue_rw_rq lizf
2015-10-20  0:47 ` [PATCH 3.4 35/65] nfs: increase size of EXCHANGE_ID name string buffer lizf
2015-10-20  0:47 ` [PATCH 3.4 36/65] bridge: fix br_stp_set_bridge_priority race conditions lizf
2015-10-20  0:47 ` [PATCH 3.4 37/65] ext4: call sync_blockdev() before invalidate_bdev() in put_super() lizf
2015-10-20  0:47 ` [PATCH 3.4 38/65] packet: read num_members once in packet_rcv_fanout() lizf
2015-10-20  0:47 ` [PATCH 3.4 39/65] packet: avoid out of bounds read in round robin fanout lizf
2015-10-20  0:47 ` [PATCH 3.4 40/65] ext4: don't retry file block mapping on bigalloc fs with non-extent file lizf
2015-10-20  0:47 ` [PATCH 3.4 41/65] watchdog: omap: assert the counter being stopped before reprogramming lizf
2015-10-20  0:47 ` [PATCH 3.4 42/65] bridge: multicast: restore router configuration on port link down/up lizf
2015-10-20  0:47 ` [PATCH 3.4 43/65] stmmac: troubleshoot unexpected bits in des0 & des1 lizf
2015-10-20  0:47 ` [PATCH 3.4 44/65] mm: kmemleak: allow safe memory scanning during kmemleak disabling lizf
2015-10-20  0:47 ` [PATCH 3.4 45/65] dell-laptop: Fix allocating & freeing SMI buffer page lizf
2015-10-20  0:47 ` [PATCH 3.4 46/65] tracing/filter: Do not WARN on operand count going below zero lizf
2015-10-20  0:47 ` [PATCH 3.4 47/65] tracing/filter: Do not allow infix to exceed end of string lizf
2015-10-20  0:47 ` [PATCH 3.4 48/65] __bitmap_parselist: fix bug in empty string handling lizf
2015-10-20  0:47 ` [PATCH 3.4 49/65] agp/intel: Fix typo in needs_ilk_vtd_wa() lizf
2015-10-20  0:48 ` [PATCH 3.4 50/65] crush: fix a bug in tree bucket decode lizf
2015-10-20  0:48 ` [PATCH 3.4 51/65] fuse: initialize fc->release before calling it lizf
2015-10-20  0:48 ` [PATCH 3.4 52/65] ACPICA: Tables: Fix an issue that FACS initialization is performed twice lizf
2015-10-20 13:35   ` Moore, Robert
2015-10-21  1:24     ` Zheng, Lv
2015-10-20  0:48 ` [PATCH 3.4 53/65] KVM: x86: make vapics_in_nmi_mode atomic lizf
2015-10-20  0:48 ` [PATCH 3.4 54/65] KVM: x86: properly restore LVT0 lizf
2015-10-20  0:48 ` [PATCH 3.4 55/65] 9p: forgetting to cancel request on interrupted zero-copy RPC lizf
2015-10-20  0:48 ` [PATCH 3.4 56/65] Revert "drm/i915: Don't skip request retirement if the active list is empty" lizf
2015-10-20  0:48 ` [PATCH 3.4 57/65] Revert "drm/radeon: Use drm_calloc_ab for CS relocs" lizf
2015-10-20  0:48 ` [PATCH 3.4 58/65] drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling" lizf
2015-10-20  0:48 ` [PATCH 3.4 59/65] crypto: s390/ghash: Fix incorrect backport of a1cae34e23b1 lizf
2015-10-20  0:48 ` [PATCH 3.4 60/65] ARM: Fix incorrect backport of 0b59d8806a31 lizf
2015-10-20  0:48 ` [PATCH 3.4 61/65] usb: dwc3: Reset the transfer resource index on SET_INTERFACE lizf
2015-10-20  0:48 ` [PATCH 3.4 62/65] jbd2: avoid infinite loop when destroying aborted journal lizf
2015-10-20  0:48 ` [PATCH 3.4 63/65] IB/qib: Change lkey table allocation to support more MRs lizf
2015-10-20  0:48 ` [PATCH 3.4 64/65] dcache: Handle escaped paths in prepend_path lizf
2015-10-20  0:48 ` [PATCH 3.4 65/65] vfs: Test for and handle paths that are unreachable from their mnt_root lizf
2015-10-20  1:06 ` [PATCH 3.4 00/65] 3.4.110-rc1 review Zefan Li
2015-10-20  2:17 ` Guenter Roeck
2015-10-20  7:05   ` Geert Uytterhoeven
2015-10-20  8:23     ` Zefan Li
2015-10-20  8:59       ` Geert Uytterhoeven
2015-10-20 12:56         ` Guenter Roeck
2015-10-20 13:15         ` Guenter Roeck
2015-10-21  8:05           ` Zefan Li

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:dadfedb dfblob:6bb5285 dfblob:f697468 dfblob:ad64b94
dfblob:2ffbf99 dfblob:129bca4 )
 OR (
bs:"[PATCH 3.4 32/65] jbd2: fix ocfs2 corrupt when updating journal superblock fails" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445302095-4695-32-git-send-email-lizf@kernel.org \
    --to=lizf@kernel.org \
    --cc=joseph.qi@huawei.com \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=stable@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).