public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
	Zwane Mwaikambo <zwane@arm.linux.org.uk>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Dave Jones <davej@redhat.com>,
	Chuck Wolber <chuckw@quantumlinux.com>,
	Chris Wedgwood <reviews@ml.cw.f00f.org>,
	Michael Krufky <mkrufky@linuxtv.org>,
	Chuck Ebbert <cebbert@redhat.com>,
	Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
	Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
	Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Mingming Cao <cmm@us.ibm.com>,
	Jan Kara <jack@suse.cz>
Subject: [patch 08/33] jbd: fix race between free buffer and commit transaction
Date: Mon, 4 Aug 2008 13:15:00 -0700	[thread overview]
Message-ID: <20080804201500.GI1139@suse.de> (raw)
In-Reply-To: <20080804201321.GA1139@suse.de>

[-- Attachment #1: jbd-fix-race-between-free-buffer-and-commit-transaction.patch --]
[-- Type: text/plain, Size: 4891 bytes --]

2.6.25-stable review patch.  If anyone has any objections, please let us
know.

------------------

From: Mingming Cao <cmm@us.ibm.com>

commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91 upstream

journal_try_to_free_buffers() could race with jbd commit transaction when
the later is holding the buffer reference while waiting for the data
buffer to flush to disk.  If the caller of journal_try_to_free_buffers()
request tries hard to release the buffers, it will treat the failure as
error and return back to the caller.  We have seen the directo IO failed
due to this race.  Some of the caller of releasepage() also expecting the
buffer to be dropped when passed with GFP_KERNEL mask to the
releasepage()->journal_try_to_free_buffers().

With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to
indicating this call could wait, in case of try_to_free_buffers() failed,
let's waiting for journal_commit_transaction() to finish commit the
current committing transaction, then try to free those buffers again.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/jbd/transaction.c |   57 +++++++++++++++++++++++++++++++++++++++++++++++++--
 mm/filemap.c         |    3 --
 2 files changed, 56 insertions(+), 4 deletions(-)

--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1633,12 +1633,42 @@ out:
 	return;
 }
 
+/*
+ * journal_try_to_free_buffers() could race with journal_commit_transaction()
+ * The latter might still hold the a count on buffers when inspecting
+ * them on t_syncdata_list or t_locked_list.
+ *
+ * journal_try_to_free_buffers() will call this function to
+ * wait for the current transaction to finish syncing data buffers, before
+ * tryinf to free that buffer.
+ *
+ * Called with journal->j_state_lock held.
+ */
+static void journal_wait_for_transaction_sync_data(journal_t *journal)
+{
+	transaction_t *transaction = NULL;
+	tid_t tid;
+
+	spin_lock(&journal->j_state_lock);
+	transaction = journal->j_committing_transaction;
+
+	if (!transaction) {
+		spin_unlock(&journal->j_state_lock);
+		return;
+	}
+
+	tid = transaction->t_tid;
+	spin_unlock(&journal->j_state_lock);
+	log_wait_commit(journal, tid);
+}
 
 /**
  * int journal_try_to_free_buffers() - try to free page buffers.
  * @journal: journal for operation
  * @page: to try and free
- * @unused_gfp_mask: unused
+ * @gfp_mask: we use the mask to detect how hard should we try to release
+ * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to
+ * release the buffers.
  *
  *
  * For all the buffers on this page,
@@ -1667,9 +1697,11 @@ out:
  * journal_try_to_free_buffer() is changing its state.  But that
  * cannot happen because we never reallocate freed data as metadata
  * while the data is part of a transaction.  Yes?
+ *
+ * Return 0 on failure, 1 on success
  */
 int journal_try_to_free_buffers(journal_t *journal,
-				struct page *page, gfp_t unused_gfp_mask)
+				struct page *page, gfp_t gfp_mask)
 {
 	struct buffer_head *head;
 	struct buffer_head *bh;
@@ -1698,7 +1730,28 @@ int journal_try_to_free_buffers(journal_
 		if (buffer_jbd(bh))
 			goto busy;
 	} while ((bh = bh->b_this_page) != head);
+
 	ret = try_to_free_buffers(page);
+
+	/*
+	 * There are a number of places where journal_try_to_free_buffers()
+	 * could race with journal_commit_transaction(), the later still
+	 * holds the reference to the buffers to free while processing them.
+	 * try_to_free_buffers() failed to free those buffers. Some of the
+	 * caller of releasepage() request page buffers to be dropped, otherwise
+	 * treat the fail-to-free as errors (such as generic_file_direct_IO())
+	 *
+	 * So, if the caller of try_to_release_page() wants the synchronous
+	 * behaviour(i.e make sure buffers are dropped upon return),
+	 * let's wait for the current transaction to finish flush of
+	 * dirty data buffers, then try to free those buffers again,
+	 * with the journal locked.
+	 */
+	if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) {
+		journal_wait_for_transaction_sync_data(journal);
+		ret = try_to_free_buffers(page);
+	}
+
 busy:
 	return ret;
 }
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2574,9 +2574,8 @@ out:
  * Otherwise return zero.
  *
  * The @gfp_mask argument specifies whether I/O may be performed to release
- * this page (__GFP_IO), and whether the call may block (__GFP_WAIT).
+ * this page (__GFP_IO), and whether the call may block (__GFP_WAIT & __GFP_FS).
  *
- * NOTE: @gfp_mask may go away, and this function may become non-blocking.
  */
 int try_to_release_page(struct page *page, gfp_t gfp_mask)
 {

-- 

  parent reply	other threads:[~2008-08-04 20:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20080804200515.110033151@mini.kroah.org>
2008-08-04 20:13 ` [patch 00/33] 2.6.25-stable review cycle Greg KH
2008-08-04 20:14   ` [patch 01/33] romfs_readpage: dont report errors for pages beyond i_size Greg KH
2008-08-04 20:14   ` [patch 02/33] x86: ioremap of 64-bit resource on 32-bit kernel fix Greg KH
2008-08-04 20:14   ` [patch 03/33] SCSI: megaraid_mbox: fix Dell CERC firmware problem Greg KH
2008-08-04 20:14   ` [patch 04/33] return to old errno choice in mkdir() et.al Greg KH
2008-08-04 20:14   ` [patch 05/33] POWERPC: PS3: Add time include to lpm Greg KH
2008-08-04 20:14   ` [patch 06/33] NFS: Ensure we zap only the access and acl caches when setting new acls Greg KH
2008-08-04 20:14   ` [patch 07/33] jbd: fix the way the b_modified flag is cleared Greg KH
2008-08-04 20:15   ` Greg KH [this message]
2008-08-04 20:15   ` [patch 09/33] jbd: fix possible journal overflow issues Greg KH
2008-08-04 20:15   ` [patch 10/33] Input: i8042 - retry failed CTR writes when resuming Greg KH
2008-08-04 20:15   ` [patch 11/33] Input: i8042 - add Intel D845PESV to nopnp list Greg KH
2008-08-04 20:15   ` [patch 12/33] Input: i8042 - add Gericom Bellagio to nomux blacklist Greg KH
2008-08-04 20:15   ` [patch 13/33] Input: i8042 - add Fujitsu-Siemens Amilo Pro V2030 to nomux table Greg KH
2008-08-04 20:15   ` [patch 14/33] Input: i8042 - add Fujitsu-Siemens Amilo Pro 2010 to nomux list Greg KH
2008-08-04 20:15   ` [patch 15/33] Input: i8042 - add Acer Aspire 1360 to nomux blacklist Greg KH
2008-08-04 20:15   ` [patch 16/33] FAT_VALID_MEDIA(): remove pointless test Greg KH
2008-08-04 20:15   ` [patch 17/33] fat: detect media without partition table correctly Greg KH
2008-08-04 20:15   ` [patch 18/33] Bluetooth: Signal user-space for HIDP and BNEP socket errors Greg KH
2008-08-04 20:15   ` [patch 19/33] bay: exit if notify handler cannot be installed Greg KH
2008-08-04 20:15   ` [patch 20/33] ath5k: Use software encryption for now Greg KH
2008-08-04 20:16   ` [patch 21/33] Add compat handler for PTRACE_GETSIGINFO Greg KH
2008-08-04 20:16   ` [patch 22/33] ACPI: Reject below-freezing temperatures as invalid critical temperatures Greg KH
2008-08-04 20:16   ` [patch 23/33] ACPI: update thermal temperature Greg KH
2008-08-04 20:16   ` [patch 24/33] Input: appletouch - implement reset-resume logic Greg KH
2008-08-04 20:16   ` [patch 25/33] USB: EHCI: fix remote-wakeup regression Greg KH
2008-08-04 20:16   ` [patch 26/33] pci: VT3336 cant do MSI either Greg KH
2008-08-04 20:16   ` [patch 27/33] ALSA: ac97 - Fix ASUS A9T laptop output Greg KH
2008-08-04 20:16   ` [patch 28/33] ALSA: Add more fallbacks to OSS PHONEOUT mixer map Greg KH
2008-08-04 20:16   ` [patch 29/33] ALSA: emu10k1 - Fix inverted Analog/Digital mixer switch on Audigy2 Greg KH
2008-08-04 20:16   ` [patch 30/33] ALSA: Fix Oops with usb-audio reconnection Greg KH
2008-08-04 20:16   ` [patch 31/33] ALSA: hda - Add missing Thinkpad Z60m support Greg KH
2008-08-04 20:16   ` [patch 32/33] ALSA: hda - Fix wrong volumes in AD1988 auto-probe mode Greg KH
2008-08-04 20:16   ` [patch 33/33] vfs: fix lookup on deleted directory Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080804201500.GI1139@suse.de \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=cavokz@gmail.com \
    --cc=cebbert@redhat.com \
    --cc=chuckw@quantumlinux.com \
    --cc=cmm@us.ibm.com \
    --cc=davej@redhat.com \
    --cc=eteo@redhat.com \
    --cc=jack@suse.cz \
    --cc=jake@lwn.net \
    --cc=jmforbes@linuxtx.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkrufky@linuxtv.org \
    --cc=rbranco@la.checkpoint.com \
    --cc=rdunlap@xenotime.net \
    --cc=reviews@ml.cw.f00f.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=w@1wt.eu \
    --cc=zwane@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox