public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg KH <gregkh@linuxfoundation.org>,
	Hugh Dickins <hughd@google.com>, Jens Axboe <axboe@kernel.dk>
Subject: [ 27/46] block: replace __getblk_slow misfix by grow_dev_page fix
Date: Wed, 12 Sep 2012 16:39:17 -0700	[thread overview]
Message-ID: <20120912233820.511734466@linuxfoundation.org> (raw)
In-Reply-To: <20120912233817.662663809@linuxfoundation.org>

From: Greg KH <gregkh@linuxfoundation.org>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit 676ce6d5ca3098339c028d44fe0427d1566a4d2d upstream.

Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow")
is not good: a successful call to grow_buffers() cannot guarantee
that the page won't be reclaimed before the immediate next call to
__find_get_block(), which is why there was always a loop there.

Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
inode #19278: block 664: comm cc1: unable to read itable block" on console,
which pointed to this commit.

I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
sometimes fails from a missing header file, under memory pressure on
ppc G5.  I've never seen this on x86, and I've never seen it on 3.5-rc7
itself, despite that commit being in there: bisection pointed to an
irrelevant pinctrl merge, but hard to tell when failure takes between
18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).

(I've since found such __ext4_get_inode_loc errors in /var/log/messages
from previous weeks: why the message never appeared on console until
yesterday morning is a mystery for another day.)

Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus
a checkpatch nitfix).  Simplify the interface between grow_buffers()
and grow_dev_page(), and avoid the infinite loop beyond end of device
by instead checking init_page_buffers()'s end_block there (I presume
that's more efficient than a repeated call to blkdev_max_block()),
returning -ENXIO to __getblk_slow() in that case.

And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
comment, but that is worrying: are all users of __getblk() really
now prepared for a NULL bh beyond end of device, or will some oops??

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/buffer.c |   66 +++++++++++++++++++++++++++---------------------------------
 1 file changed, 30 insertions(+), 36 deletions(-)

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -961,7 +961,7 @@ link_dev_buffers(struct page *page, stru
 /*
  * Initialise the state of a blockdev page's buffers.
  */ 
-static void
+static sector_t
 init_page_buffers(struct page *page, struct block_device *bdev,
 			sector_t block, int size)
 {
@@ -983,33 +983,41 @@ init_page_buffers(struct page *page, str
 		block++;
 		bh = bh->b_this_page;
 	} while (bh != head);
+
+	/*
+	 * Caller needs to validate requested block against end of device.
+	 */
+	return end_block;
 }
 
 /*
  * Create the page-cache page that contains the requested block.
  *
- * This is user purely for blockdev mappings.
+ * This is used purely for blockdev mappings.
  */
-static struct page *
+static int
 grow_dev_page(struct block_device *bdev, sector_t block,
-		pgoff_t index, int size)
+		pgoff_t index, int size, int sizebits)
 {
 	struct inode *inode = bdev->bd_inode;
 	struct page *page;
 	struct buffer_head *bh;
+	sector_t end_block;
+	int ret = 0;		/* Will call free_more_memory() */
 
 	page = find_or_create_page(inode->i_mapping, index,
 		(mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE);
 	if (!page)
-		return NULL;
+		return ret;
 
 	BUG_ON(!PageLocked(page));
 
 	if (page_has_buffers(page)) {
 		bh = page_buffers(page);
 		if (bh->b_size == size) {
-			init_page_buffers(page, bdev, block, size);
-			return page;
+			end_block = init_page_buffers(page, bdev,
+						index << sizebits, size);
+			goto done;
 		}
 		if (!try_to_free_buffers(page))
 			goto failed;
@@ -1029,15 +1037,15 @@ grow_dev_page(struct block_device *bdev,
 	 */
 	spin_lock(&inode->i_mapping->private_lock);
 	link_dev_buffers(page, bh);
-	init_page_buffers(page, bdev, block, size);
+	end_block = init_page_buffers(page, bdev, index << sizebits, size);
 	spin_unlock(&inode->i_mapping->private_lock);
-	return page;
+done:
+	ret = (block < end_block) ? 1 : -ENXIO;
 
 failed:
-	BUG();
 	unlock_page(page);
 	page_cache_release(page);
-	return NULL;
+	return ret;
 }
 
 /*
@@ -1047,7 +1055,6 @@ failed:
 static int
 grow_buffers(struct block_device *bdev, sector_t block, int size)
 {
-	struct page *page;
 	pgoff_t index;
 	int sizebits;
 
@@ -1071,22 +1078,14 @@ grow_buffers(struct block_device *bdev,
 			bdevname(bdev, b));
 		return -EIO;
 	}
-	block = index << sizebits;
+
 	/* Create a page with the proper size buffers.. */
-	page = grow_dev_page(bdev, block, index, size);
-	if (!page)
-		return 0;
-	unlock_page(page);
-	page_cache_release(page);
-	return 1;
+	return grow_dev_page(bdev, block, index, size, sizebits);
 }
 
 static struct buffer_head *
 __getblk_slow(struct block_device *bdev, sector_t block, int size)
 {
-	int ret;
-	struct buffer_head *bh;
-
 	/* Size must be multiple of hard sectorsize */
 	if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
 			(size < 512 || size > PAGE_SIZE))) {
@@ -1099,21 +1098,20 @@ __getblk_slow(struct block_device *bdev,
 		return NULL;
 	}
 
-retry:
-	bh = __find_get_block(bdev, block, size);
-	if (bh)
-		return bh;
+	for (;;) {
+		struct buffer_head *bh;
+		int ret;
 
-	ret = grow_buffers(bdev, block, size);
-	if (ret == 0) {
-		free_more_memory();
-		goto retry;
-	} else if (ret > 0) {
 		bh = __find_get_block(bdev, block, size);
 		if (bh)
 			return bh;
+
+		ret = grow_buffers(bdev, block, size);
+		if (ret < 0)
+			return NULL;
+		if (ret == 0)
+			free_more_memory();
 	}
-	return NULL;
 }
 
 /*
@@ -1369,10 +1367,6 @@ EXPORT_SYMBOL(__find_get_block);
  * which corresponds to the passed block_device, block and size. The
  * returned buffer has its reference count incremented.
  *
- * __getblk() cannot fail - it just keeps trying.  If you pass it an
- * illegal block number, __getblk() will happily return a buffer_head
- * which represents the non-existent block.  Very weird.
- *
  * __getblk() will lock up the machine if grow_dev_page's try_to_free_buffers()
  * attempt is failing.  FIXME, perhaps?
  */



  parent reply	other threads:[~2012-09-12 23:47 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-12 23:38 [ 00/46] 3.0.43-stable review Greg Kroah-Hartman
2012-09-12 23:38 ` [ 01/46] USB: vt6656: remove __devinit* from the struct usb_device_id table Greg Kroah-Hartman
2012-09-12 23:38 ` [ 02/46] USB: emi62: " Greg Kroah-Hartman
2012-09-12 23:38 ` [ 03/46] ALSA: hda - fix Copyright debug message Greg Kroah-Hartman
2012-09-12 23:38 ` [ 04/46] ARM: 7487/1: mm: avoid setting nG bit for user mappings that arent present Greg Kroah-Hartman
2012-09-12 23:38 ` [ 05/46] ARM: 7488/1: mm: use 5 bits for swapfile type encoding Greg Kroah-Hartman
2012-09-12 23:38 ` [ 06/46] ARM: 7489/1: errata: fix workaround for erratum #720789 on UP systems Greg Kroah-Hartman
2012-09-12 23:38 ` [ 07/46] ARM: S3C24XX: Fix s3c2410_dma_enqueue parameters Greg Kroah-Hartman
2012-09-12 23:38 ` [ 08/46] ARM: imx: select CPU_FREQ_TABLE when needed Greg Kroah-Hartman
2012-09-12 23:38 ` [ 09/46] ASoC: wm9712: Fix microphone source selection Greg Kroah-Hartman
2012-09-12 23:39 ` [ 10/46] vfs: missed source of ->f_pos races Greg Kroah-Hartman
2012-09-12 23:39 ` [ 11/46] vfs: canonicalize create mode in build_open_flags() Greg Kroah-Hartman
2012-09-12 23:39 ` [ 12/46] alpha: Dont export SOCK_NONBLOCK to user space Greg Kroah-Hartman
2012-09-12 23:39 ` [ 13/46] USB: winbond: remove __devinit* from the struct usb_device_id table Greg Kroah-Hartman
2012-09-12 23:39 ` [ 14/46] mm: hugetlbfs: correctly populate shared pmd Greg Kroah-Hartman
2012-09-12 23:39 ` [ 15/46] NFSv3: Ensure that do_proc_get_root() reports errors correctly Greg Kroah-Hartman
2012-09-12 23:39 ` [ 16/46] NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done Greg Kroah-Hartman
2012-09-16 16:33   ` Ben Hutchings
2012-09-16 16:37     ` Greg Kroah-Hartman
2012-09-17 13:05       ` Myklebust, Trond
2012-09-19  9:49         ` Boaz Harrosh
2012-09-12 23:39 ` [ 17/46] NFS: Alias the nfs module to nfs4 Greg Kroah-Hartman
2012-09-12 23:39 ` [ 18/46] audit: dont free_chunk() after fsnotify_add_mark() Greg Kroah-Hartman
2012-09-12 23:39 ` [ 19/46] audit: fix refcounting in audit-tree Greg Kroah-Hartman
2012-09-12 23:39 ` [ 20/46] svcrpc: fix BUG() in svc_tcp_clear_pages Greg Kroah-Hartman
2012-09-12 23:39 ` [ 21/46] svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping Greg Kroah-Hartman
2012-09-12 23:39 ` [ 22/46] svcrpc: sends on closed socket should stop immediately Greg Kroah-Hartman
2012-09-12 23:39 ` [ 23/46] cciss: fix incorrect scsi status reporting Greg Kroah-Hartman
2012-09-12 23:39 ` [ 24/46] ACPI: export symbol acpi_get_table_with_size Greg Kroah-Hartman
2012-09-15  0:22   ` Ben Hutchings
2012-09-15  3:13     ` Greg Kroah-Hartman
2012-09-12 23:39 ` [ 25/46] ath9k: fix decrypt_error initialization in ath_rx_tasklet() Greg Kroah-Hartman
2012-09-12 23:39 ` [ 26/46] PCI: EHCI: Fix crash during hibernation on ASUS computers Greg Kroah-Hartman
2012-09-12 23:39 ` Greg Kroah-Hartman [this message]
2012-09-12 23:39 ` [ 28/46] USB: spca506: remove __devinit* from the struct usb_device_id table Greg Kroah-Hartman
2012-09-12 23:39 ` [ 29/46] USB: p54usb: " Greg Kroah-Hartman
2012-09-12 23:39 ` [ 30/46] USB: rtl8187: " Greg Kroah-Hartman
2012-09-12 23:39 ` [ 31/46] USB: smsusb: " Greg Kroah-Hartman
2012-09-12 23:39 ` [ 32/46] USB: CDC ACM: Fix NULL pointer dereference Greg Kroah-Hartman
2012-09-12 23:39 ` [ 33/46] powerpc: Fix DSCR inheritance in copy_thread() Greg Kroah-Hartman
2012-09-12 23:39 ` [ 34/46] powerpc: Restore correct DSCR in context switch Greg Kroah-Hartman
2012-09-12 23:39 ` [ 35/46] Remove user-triggerable BUG from mpol_to_str Greg Kroah-Hartman
2012-09-12 23:39 ` [ 36/46] SCSI: megaraid_sas: Move poll_aen_lock initializer Greg Kroah-Hartman
2012-09-12 23:39 ` [ 37/46] SCSI: mpt2sas: Fix for Driver oops, when loading driver with max_queue_depth command line option to a very small value Greg Kroah-Hartman
2012-09-12 23:39 ` [ 38/46] SCSI: Fix Device not ready issue on mpt2sas Greg Kroah-Hartman
2012-09-12 23:39 ` [ 39/46] udf: Fix data corruption for files in ICB Greg Kroah-Hartman
2012-09-12 23:39 ` [ 40/46] ext3: Fix fdatasync() for files with only i_size changes Greg Kroah-Hartman
2012-09-12 23:39 ` [ 41/46] fuse: fix retrieve length Greg Kroah-Hartman
2012-09-12 23:39 ` [ 42/46] Input: i8042 - add Gigabyte T1005 series netbooks to noloop table Greg Kroah-Hartman
2012-09-12 23:39 ` [ 43/46] drm/vmwgfx: add MODULE_DEVICE_TABLE so vmwgfx loads at boot Greg Kroah-Hartman
2012-09-12 23:39 ` [ 44/46] PARISC: Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts Greg Kroah-Hartman
2012-09-12 23:39 ` [ 45/46] dccp: check ccid before dereferencing Greg Kroah-Hartman
2012-09-12 23:39 ` [ 46/46] hwmon: (asus_atk0110) Add quirk for Asus M5A78L Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120912233820.511734466@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox