public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	alan@lxorguk.ukuu.org.uk, Josef Bacik <jbacik@fusionio.com>,
	Jan Kara <jack@suse.cz>
Subject: [ 63/76] jbd: Fix assertion failure in commit code due to lacking transaction credits
Date: Thu, 18 Oct 2012 19:47:27 -0700	[thread overview]
Message-ID: <20121019024400.263671828@linuxfoundation.org> (raw)
In-Reply-To: <20121019024350.087156547@linuxfoundation.org>

3.6-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3 upstream.

ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached.  The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.

The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.

Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.

We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.

Reviewed-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/jbd/commit.c      |   45 +++++++++++++++++++++++++++--------
 fs/jbd/transaction.c |   64 +++++++++++++++++++++++++++++++++++----------------
 2 files changed, 78 insertions(+), 31 deletions(-)

--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -86,7 +86,12 @@ nope:
 static void release_data_buffer(struct buffer_head *bh)
 {
 	if (buffer_freed(bh)) {
+		WARN_ON_ONCE(buffer_dirty(bh));
 		clear_buffer_freed(bh);
+		clear_buffer_mapped(bh);
+		clear_buffer_new(bh);
+		clear_buffer_req(bh);
+		bh->b_bdev = NULL;
 		release_buffer_page(bh);
 	} else
 		put_bh(bh);
@@ -866,17 +871,35 @@ restart_loop:
 		 * there's no point in keeping a checkpoint record for
 		 * it. */
 
-		/* A buffer which has been freed while still being
-		 * journaled by a previous transaction may end up still
-		 * being dirty here, but we want to avoid writing back
-		 * that buffer in the future after the "add to orphan"
-		 * operation been committed,  That's not only a performance
-		 * gain, it also stops aliasing problems if the buffer is
-		 * left behind for writeback and gets reallocated for another
-		 * use in a different page. */
-		if (buffer_freed(bh) && !jh->b_next_transaction) {
-			clear_buffer_freed(bh);
-			clear_buffer_jbddirty(bh);
+		/*
+		 * A buffer which has been freed while still being journaled by
+		 * a previous transaction.
+		 */
+		if (buffer_freed(bh)) {
+			/*
+			 * If the running transaction is the one containing
+			 * "add to orphan" operation (b_next_transaction !=
+			 * NULL), we have to wait for that transaction to
+			 * commit before we can really get rid of the buffer.
+			 * So just clear b_modified to not confuse transaction
+			 * credit accounting and refile the buffer to
+			 * BJ_Forget of the running transaction. If the just
+			 * committed transaction contains "add to orphan"
+			 * operation, we can completely invalidate the buffer
+			 * now. We are rather throughout in that since the
+			 * buffer may be still accessible when blocksize <
+			 * pagesize and it is attached to the last partial
+			 * page.
+			 */
+			jh->b_modified = 0;
+			if (!jh->b_next_transaction) {
+				clear_buffer_freed(bh);
+				clear_buffer_jbddirty(bh);
+				clear_buffer_mapped(bh);
+				clear_buffer_new(bh);
+				clear_buffer_req(bh);
+				bh->b_bdev = NULL;
+			}
 		}
 
 		if (buffer_jbddirty(bh)) {
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1843,15 +1843,16 @@ static int __dispose_buffer(struct journ
  * We're outside-transaction here.  Either or both of j_running_transaction
  * and j_committing_transaction may be NULL.
  */
-static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
+static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
+				int partial_page)
 {
 	transaction_t *transaction;
 	struct journal_head *jh;
 	int may_free = 1;
-	int ret;
 
 	BUFFER_TRACE(bh, "entry");
 
+retry:
 	/*
 	 * It is safe to proceed here without the j_list_lock because the
 	 * buffers cannot be stolen by try_to_free_buffers as long as we are
@@ -1879,10 +1880,18 @@ static int journal_unmap_buffer(journal_
 	 * clear the buffer dirty bit at latest at the moment when the
 	 * transaction marking the buffer as freed in the filesystem
 	 * structures is committed because from that moment on the
-	 * buffer can be reallocated and used by a different page.
+	 * block can be reallocated and used by a different page.
 	 * Since the block hasn't been freed yet but the inode has
 	 * already been added to orphan list, it is safe for us to add
 	 * the buffer to BJ_Forget list of the newest transaction.
+	 *
+	 * Also we have to clear buffer_mapped flag of a truncated buffer
+	 * because the buffer_head may be attached to the page straddling
+	 * i_size (can happen only when blocksize < pagesize) and thus the
+	 * buffer_head can be reused when the file is extended again. So we end
+	 * up keeping around invalidated buffers attached to transactions'
+	 * BJ_Forget list just to stop checkpointing code from cleaning up
+	 * the transaction this buffer was modified in.
 	 */
 	transaction = jh->b_transaction;
 	if (transaction == NULL) {
@@ -1909,13 +1918,9 @@ static int journal_unmap_buffer(journal_
 			 * committed, the buffer won't be needed any
 			 * longer. */
 			JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
-			ret = __dispose_buffer(jh,
+			may_free = __dispose_buffer(jh,
 					journal->j_running_transaction);
-			journal_put_journal_head(jh);
-			spin_unlock(&journal->j_list_lock);
-			jbd_unlock_bh_state(bh);
-			spin_unlock(&journal->j_state_lock);
-			return ret;
+			goto zap_buffer;
 		} else {
 			/* There is no currently-running transaction. So the
 			 * orphan record which we wrote for this file must have
@@ -1923,13 +1928,9 @@ static int journal_unmap_buffer(journal_
 			 * the committing transaction, if it exists. */
 			if (journal->j_committing_transaction) {
 				JBUFFER_TRACE(jh, "give to committing trans");
-				ret = __dispose_buffer(jh,
+				may_free = __dispose_buffer(jh,
 					journal->j_committing_transaction);
-				journal_put_journal_head(jh);
-				spin_unlock(&journal->j_list_lock);
-				jbd_unlock_bh_state(bh);
-				spin_unlock(&journal->j_state_lock);
-				return ret;
+				goto zap_buffer;
 			} else {
 				/* The orphan record's transaction has
 				 * committed.  We can cleanse this buffer */
@@ -1950,10 +1951,24 @@ static int journal_unmap_buffer(journal_
 		}
 		/*
 		 * The buffer is committing, we simply cannot touch
-		 * it. So we just set j_next_transaction to the
-		 * running transaction (if there is one) and mark
-		 * buffer as freed so that commit code knows it should
-		 * clear dirty bits when it is done with the buffer.
+		 * it. If the page is straddling i_size we have to wait
+		 * for commit and try again.
+		 */
+		if (partial_page) {
+			tid_t tid = journal->j_committing_transaction->t_tid;
+
+			journal_put_journal_head(jh);
+			spin_unlock(&journal->j_list_lock);
+			jbd_unlock_bh_state(bh);
+			spin_unlock(&journal->j_state_lock);
+			log_wait_commit(journal, tid);
+			goto retry;
+		}
+		/*
+		 * OK, buffer won't be reachable after truncate. We just set
+		 * j_next_transaction to the running transaction (if there is
+		 * one) and mark buffer as freed so that commit code knows it
+		 * should clear dirty bits when it is done with the buffer.
 		 */
 		set_buffer_freed(bh);
 		if (journal->j_running_transaction && buffer_jbddirty(bh))
@@ -1976,6 +1991,14 @@ static int journal_unmap_buffer(journal_
 	}
 
 zap_buffer:
+	/*
+	 * This is tricky. Although the buffer is truncated, it may be reused
+	 * if blocksize < pagesize and it is attached to the page straddling
+	 * EOF. Since the buffer might have been added to BJ_Forget list of the
+	 * running transaction, journal_get_write_access() won't clear
+	 * b_modified and credit accounting gets confused. So clear b_modified
+	 * here. */
+	jh->b_modified = 0;
 	journal_put_journal_head(jh);
 zap_buffer_no_jh:
 	spin_unlock(&journal->j_list_lock);
@@ -2024,7 +2047,8 @@ void journal_invalidatepage(journal_t *j
 		if (offset <= curr_off) {
 			/* This block is wholly outside the truncation point */
 			lock_buffer(bh);
-			may_free &= journal_unmap_buffer(journal, bh);
+			may_free &= journal_unmap_buffer(journal, bh,
+							 offset > 0);
 			unlock_buffer(bh);
 		}
 		curr_off = next_off;



  parent reply	other threads:[~2012-10-19  2:54 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-19  2:46 [ 00/76] 3.6.3-stable review Greg Kroah-Hartman
2012-10-19  2:46 ` [ 01/76] ARM: vfp: fix saving d16-d31 vfp registers on v6+ kernels Greg Kroah-Hartman
2012-10-19  2:46 ` [ 02/76] pnfsblock: fix partial page buffer wirte Greg Kroah-Hartman
2012-10-19  2:46 ` [ 03/76] NFS41: fix error of setting blocklayoutdriver Greg Kroah-Hartman
2012-10-19  2:46 ` [ 04/76] NFS: Remove bad delegations during open recovery Greg Kroah-Hartman
2012-10-19  2:46 ` [ 05/76] nfsd4: dont pin clientids to pseudoflavors Greg Kroah-Hartman
2012-10-19  2:46 ` [ 06/76] nfsd4: fix nfs4 stateid leak Greg Kroah-Hartman
2012-10-19  2:46 ` [ 07/76] NFSD: pass null terminated buf to kstrtouint() Greg Kroah-Hartman
2012-10-19  2:46 ` [ 08/76] lockd: per-net NSM client creation and destruction helpers introduced Greg Kroah-Hartman
2012-10-19  2:46 ` [ 09/76] lockd: use rpc clients cl_nodename for id encoding Greg Kroah-Hartman
2012-10-19 22:10   ` Ben Hutchings
2012-10-19 22:13     ` Ben Hutchings
2012-10-19  2:46 ` [ 10/76] lockd: create and use per-net NSM RPC clients on MON/UNMON requests Greg Kroah-Hartman
2012-10-19  2:46 ` [ 11/76] ACPI: EC: Make the GPE storm threshold a module parameter Greg Kroah-Hartman
2012-10-19  2:46 ` [ 12/76] ACPI: EC: Add a quirk for CLEVO M720T/M730T laptop Greg Kroah-Hartman
2012-10-19  2:46 ` [ 13/76] ALSA: hda - Add missing hda_gen_spec to struct via_spec Greg Kroah-Hartman
2012-10-19  2:46 ` [ 14/76] ALSA: hda - do not detect jack on internal speakers for Realtek Greg Kroah-Hartman
2012-10-19  2:46 ` [ 15/76] ALSA: hda - Fix memory leaks at error path in patch_cirrus.c Greg Kroah-Hartman
2012-10-19  2:46 ` [ 16/76] mips,kgdb: fix recursive page fault with CONFIG_KPROBES Greg Kroah-Hartman
2012-10-19  2:46 ` [ 17/76] tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking Greg Kroah-Hartman
2012-10-19  2:46 ` [ 18/76] iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT PDU Greg Kroah-Hartman
2012-10-19  2:46 ` [ 19/76] iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp Greg Kroah-Hartman
2012-10-19  2:46 ` [ 20/76] iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode Greg Kroah-Hartman
2012-10-19  2:46 ` [ 21/76] iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values Greg Kroah-Hartman
2012-10-19  2:46 ` [ 22/76] SCSI: storvsc: Account for in-transit packets in the RESET path Greg Kroah-Hartman
2012-10-19  2:46 ` [ 23/76] SCSI: scsi_debug: Fix off-by-one bug when unmapping region Greg Kroah-Hartman
2012-10-19  2:46 ` [ 24/76] SCSI: virtio-scsi: initialize scatterlist structure Greg Kroah-Hartman
2012-10-19  2:46 ` [ 25/76] ARM: 7541/1: Add ARM ERRATA 775420 workaround Greg Kroah-Hartman
2012-10-19  2:46 ` [ 26/76] ARM: OMAP: counter: add locking to read_persistent_clock Greg Kroah-Hartman
2012-10-19  2:46 ` [ 27/76] firewire: cdev: fix user memory corruption (i386 userland on amd64 kernel) Greg Kroah-Hartman
2012-10-19  2:46 ` [ 28/76] SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT Greg Kroah-Hartman
2012-10-19  2:46 ` [ 29/76] target: support zero allocation length in INQUIRY Greg Kroah-Hartman
2012-10-19  2:46 ` [ 30/76] target: fix truncation of mode data, support zero allocation length Greg Kroah-Hartman
2012-10-19  2:46 ` [ 31/76] target: fix return code in target_core_init_configfs error path Greg Kroah-Hartman
2012-10-19  2:46 ` [ 32/76] target/file: Re-enable optional fd_buffered_io=1 operation Greg Kroah-Hartman
2012-10-19  2:46 ` [ 33/76] qla2xxx: Fix endianness of task management response code Greg Kroah-Hartman
2012-10-19  2:46 ` [ 34/76] vfio: Move PCI INTx eventfd setting earlier Greg Kroah-Hartman
2012-10-19  2:46 ` [ 35/76] vfio: Fix PCI INTx disable consistency Greg Kroah-Hartman
2012-10-19  2:47 ` [ 36/76] xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches Greg Kroah-Hartman
2012-10-19  2:47 ` [ 37/76] xen/bootup: allow {read|write}_cr8 pvops call Greg Kroah-Hartman
2012-10-19  2:47 ` [ 38/76] xen/bootup: allow read_tscp call for Xen PV guests Greg Kroah-Hartman
2012-10-19  2:47 ` [ 39/76] block: fix request_queue->flags initialization Greg Kroah-Hartman
2012-10-19  2:47 ` [ 40/76] autofs4 - fix reset pending flag on mount fail Greg Kroah-Hartman
2012-10-19  2:47 ` [ 41/76] module: taint kernel when lve module is loaded Greg Kroah-Hartman
2012-10-19  2:47 ` [ 42/76] video/udlfb: fix line counting in fb_write Greg Kroah-Hartman
2012-10-19  2:47 ` [ 43/76] viafb: dont touch clock state on OLPC XO-1.5 Greg Kroah-Hartman
2012-10-19  2:47 ` [ 44/76] timekeeping: Cast raw_interval to u64 to avoid shift overflow Greg Kroah-Hartman
2012-10-19  2:47 ` [ 45/76] timers: Fix endless looping between cascade() and internal_add_timer() Greg Kroah-Hartman
2012-10-19  2:47 ` [ 46/76] nohz: Fix one jiffy count too far in idle cputime Greg Kroah-Hartman
2012-10-19  2:47 ` [ 47/76] ath9k: use ieee80211_free_txskb Greg Kroah-Hartman
2012-10-19  2:47 ` [ 48/76] mac80211: use ieee80211_free_txskb to fix possible skb leaks Greg Kroah-Hartman
2012-10-19  2:47 ` [ 49/76] md/raid10: use correct limit variable Greg Kroah-Hartman
2012-10-19  2:47 ` [ 50/76] kdb,vt_console: Fix missed data due to pager overruns Greg Kroah-Hartman
2012-10-19  2:47 ` [ 51/76] pktgen: fix crash when generating IPv6 packets Greg Kroah-Hartman
2012-10-19  2:47 ` [ 52/76] MIPS: ath79: Fix CPU/DDR frequency calculation for SRIF PLLs Greg Kroah-Hartman
2012-10-19  2:47 ` [ 53/76] kbuild: Fix accidental revert in commit fe04ddf Greg Kroah-Hartman
2012-10-19  2:47 ` [ 54/76] Add CDC-ACM support for the CX93010-2x UCMxx USB Modem Greg Kroah-Hartman
2012-10-19  2:47 ` [ 55/76] fs: handle failed audit_log_start properly Greg Kroah-Hartman
2012-10-19  2:47 ` [ 56/76] fs: prevent use after free in auditing when symlink following was denied Greg Kroah-Hartman
2012-10-19  2:47 ` [ 57/76] drm/radeon: Dont destroy I2C Bus Rec in radeon_ext_tmds_enc_destroy() Greg Kroah-Hartman
2012-10-19  2:47 ` [ 58/76] drm/i915: remove useless BUG_ON which caused a regression in 3.5 Greg Kroah-Hartman
2012-10-19  2:47 ` [ 59/76] drm/i915: Set guardband clipping workaround bit in the right register Greg Kroah-Hartman
2012-10-19  2:47 ` [ 60/76] drm/nouveau/bios: fix shadowing of ACPI ROMs larger than 64KiB Greg Kroah-Hartman
2012-10-19 16:01   ` Heinz Diehl
2012-10-19 17:48     ` Greg Kroah-Hartman
2012-10-19 19:11       ` Heinz Diehl
2012-10-21 16:31         ` Greg Kroah-Hartman
2012-10-19  2:47 ` [ 61/76] drm/i915: use adjusted_mode instead of mode for checking the 6bpc force flag Greg Kroah-Hartman
2012-10-19  2:47 ` [ 62/76] mcs7830: Fix link state detection Greg Kroah-Hartman
2012-10-19  2:47 ` Greg Kroah-Hartman [this message]
2012-10-19  2:47 ` [ 64/76] mtd: nand: allow NAND_NO_SUBPAGE_WRITE to be set from driver Greg Kroah-Hartman
2012-10-19  2:47 ` [ 65/76] e1000e: Change wthresh to 1 to avoid possible Tx stalls Greg Kroah-Hartman
2012-10-19  2:47 ` [ 66/76] tpm: Propagate error from tpm_transmit to fix a timeout hang Greg Kroah-Hartman
2012-10-19  2:47 ` [ 67/76] usb: gadget: at91_udc: fix dt support Greg Kroah-Hartman
2012-10-19  2:47 ` [ 68/76] ALSA: hda - Fix registration race of VGA switcheroo Greg Kroah-Hartman
2012-10-19  2:47 ` [ 69/76] ALSA: hda - Stop LPIB delay counting on broken hardware Greg Kroah-Hartman
2012-10-19  2:47 ` [ 70/76] ALSA: hda - Always check array bounds in alc_get_line_out_pfx Greg Kroah-Hartman
2012-10-19  2:47 ` [ 71/76] ASoC: fsi: dont reschedule DMA from an atomic context Greg Kroah-Hartman
2012-10-19  2:47 ` [ 72/76] ASoC: wm2200: Use rev A register patches on rev B Greg Kroah-Hartman
2012-10-19  2:47 ` [ 73/76] ASoC: wm2200: Fix non-inverted OUT2 mute control Greg Kroah-Hartman
2012-10-19  2:47 ` [ 74/76] ASoC: omap-abe-twl6040: Fix typo of Vibrator Greg Kroah-Hartman
2012-10-19  2:47 ` [ 75/76] ALSA: ac97 - Fix missing NULL check in snd_ac97_cvol_new() Greg Kroah-Hartman
2012-10-19  2:47 ` [ 76/76] ALSA: emu10k1: add chip details for E-mu 1010 PCIe card Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121019024400.263671828@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=jack@suse.cz \
    --cc=jbacik@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox