[PATCH] block: replace __getblk_slow misfix by grow_dev

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
@ 2012-08-22  1:33 Hugh Dickins
  2012-08-22  9:12 ` Richard W.M. Jones
  2012-08-22 11:52 ` Richard W.M. Jones
  0 siblings, 2 replies; 10+ messages in thread
From: Hugh Dickins @ 2012-08-22  1:33 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, Torsten Hilbrich,
	Richard W.M. Jones, Josh Boyer, linux-kernel

Jeff,

Your commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow"),
already gone into 3.* stable, is not good.  Could you and your testers
please give this alternative a try - I think it should work, and have
started it on a few days' memory load on 3.5, but not tried your case.

But, see final paragraph of comment below, I'm worried whether
all __getblk() users are actually ready for your new NULL case.

Thanks,
Hugh

[PATCH] block: replace __getblk_slow misfix by grow_dev_page fix

Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow")
is not good: a successful call to grow_buffers() cannot guarantee
that the page won't be reclaimed before the immediate next call to
__find_get_block(), which is why there was always a loop there.

Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
inode #19278: block 664: comm cc1: unable to read itable block" on console,
which pointed to this commit.

I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
sometimes fails from a missing header file, under memory pressure on
ppc G5.  I've never seen this on x86, and I've never seen it on 3.5-rc7
itself, despite that commit being in there: bisection pointed to an
irrelevant pinctrl merge, but hard to tell when failure takes between
18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).

(I've since found such __ext4_get_inode_loc errors in /var/log/messages
from previous weeks: why the message never appeared on console until
yesterday morning is a mystery for another day.)

Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus
a checkpatch nitfix).  Avoid the infinite loop beyond end of device
by instead checking buffer_mapped(bh) in grow_dev_page() (I presume
that's more efficient than a repeated call to blkdev_max_block()),
and returning -ENXIO to __getblk_slow() in that case.

And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
comment, but that is worrying: are all users of __getblk() really
now prepared for a NULL bh beyond end of device, or will some oops??

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # 3.0 3.2 3.4 3.5
---

 fs/buffer.c |   43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

--- 3.6-rc2/fs/buffer.c	2012-08-04 09:19:20.644022328 -0700
+++ linux/fs/buffer.c	2012-08-21 08:49:32.333908590 -0700
@@ -950,6 +950,7 @@ grow_dev_page(struct block_device *bdev,
 	struct inode *inode = bdev->bd_inode;
 	struct page *page;
 	struct buffer_head *bh;
+	int err = 0;

 	page = find_or_create_page(inode->i_mapping, index,
 		(mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE);
@@ -962,7 +963,11 @@ grow_dev_page(struct block_device *bdev,
 		bh = page_buffers(page);
 		if (bh->b_size == size) {
 			init_page_buffers(page, bdev, block, size);
-			return page;
+			if (buffer_mapped(bh))
+				return page;
+
+			err = -ENXIO;
+			goto failed;
 		}
 		if (!try_to_free_buffers(page))
 			goto failed;
@@ -984,12 +989,14 @@ grow_dev_page(struct block_device *bdev,
 	link_dev_buffers(page, bh);
 	init_page_buffers(page, bdev, block, size);
 	spin_unlock(&inode->i_mapping->private_lock);
-	return page;
+	if (buffer_mapped(bh))
+		return page;

+	err = -ENXIO;
 failed:
 	unlock_page(page);
 	page_cache_release(page);
-	return NULL;
+	return ERR_PTR(err);
 }

 /*
@@ -1026,8 +1033,8 @@ grow_buffers(struct block_device *bdev,
 	block = index << sizebits;
 	/* Create a page with the proper size buffers.. */
 	page = grow_dev_page(bdev, block, index, size);
-	if (!page)
-		return 0;
+	if (IS_ERR_OR_NULL(page))
+		return PTR_RET(page);
 	unlock_page(page);
 	page_cache_release(page);
 	return 1;
@@ -1036,9 +1043,6 @@ grow_buffers(struct block_device *bdev,
 static struct buffer_head *
 __getblk_slow(struct block_device *bdev, sector_t block, int size)
 {
-	int ret;
-	struct buffer_head *bh;
-
 	/* Size must be multiple of hard sectorsize */
 	if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
 			(size < 512 || size > PAGE_SIZE))) {
@@ -1051,21 +1055,20 @@ __getblk_slow(struct block_device *bdev,
 		return NULL;
 	}

-retry:
-	bh = __find_get_block(bdev, block, size);
-	if (bh)
-		return bh;
+	for (;;) {
+		struct buffer_head *bh;
+		int ret;

-	ret = grow_buffers(bdev, block, size);
-	if (ret == 0) {
-		free_more_memory();
-		goto retry;
-	} else if (ret > 0) {
 		bh = __find_get_block(bdev, block, size);
 		if (bh)
 			return bh;
+
+		ret = grow_buffers(bdev, block, size);
+		if (ret < 0)
+			return NULL;
+		if (ret == 0)
+			free_more_memory();
 	}
-	return NULL;
 }

 /*
@@ -1321,10 +1324,6 @@ EXPORT_SYMBOL(__find_get_block);
  * which corresponds to the passed block_device, block and size. The
  * returned buffer has its reference count incremented.
  *
- * __getblk() cannot fail - it just keeps trying.  If you pass it an
- * illegal block number, __getblk() will happily return a buffer_head
- * which represents the non-existent block.  Very weird.
- *
  * __getblk() will lock up the machine if grow_dev_page's try_to_free_buffers()
  * attempt is failing.  FIXME, perhaps?
  */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-22  1:33 [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix Hugh Dickins
@ 2012-08-22  9:12 ` Richard W.M. Jones
  2012-08-22 11:52 ` Richard W.M. Jones
  1 sibling, 0 replies; 10+ messages in thread
From: Richard W.M. Jones @ 2012-08-22  9:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jeff Moyer, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

On Tue, Aug 21, 2012 at 06:33:45PM -0700, Hugh Dickins wrote:
> Jeff,
> 
> Your commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow"),
> already gone into 3.* stable, is not good.  Could you and your testers
> please give this alternative a try - I think it should work, and have
> started it on a few days' memory load on 3.5, but not tried your case.

I'll try your patch later today, but the test is very simple:

- Create a 1024 byte empty (all zeroes) partition and then try to mount it.

Failure was that the mount command would go into an infinite loop
using 100% CPU.  Success is that it gives an error.

More about it here including a very simple libguestfs script to
reproduce it:

  https://bugzilla.redhat.com/show_bug.cgi?id=835019#c0

and a virtual disk image that also demonstrates the problem without
needing anything except a VM running the test kernel:

  https://bugzilla.redhat.com/show_bug.cgi?id=835019#c1

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-22  1:33 [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix Hugh Dickins
  2012-08-22  9:12 ` Richard W.M. Jones
@ 2012-08-22 11:52 ` Richard W.M. Jones
  2012-08-23  4:56   ` Hugh Dickins
  1 sibling, 1 reply; 10+ messages in thread
From: Richard W.M. Jones @ 2012-08-22 11:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jeff Moyer, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

On Tue, Aug 21, 2012 at 06:33:45PM -0700, Hugh Dickins wrote:
> Jeff,
> 
> Your commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow"),
> already gone into 3.* stable, is not good.  Could you and your testers
> please give this alternative a try - I think it should work, and have
> started it on a few days' memory load on 3.5, but not tried your case.

I have tested your patch and it does NOT fix the problem in
http://bugzilla.redhat.com/835019
Kernel 3.6.0 + your patch => mount goes into a loop when mounting
a small empty partition.  Please do NOT apply this as it will
cause a regression!

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-22 11:52 ` Richard W.M. Jones
@ 2012-08-23  4:56   ` Hugh Dickins
  2012-08-23 10:20     ` Jens Axboe
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hugh Dickins @ 2012-08-23  4:56 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Jeff Moyer, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

On Wed, 22 Aug 2012, Richard W.M. Jones wrote:
> On Tue, Aug 21, 2012 at 06:33:45PM -0700, Hugh Dickins wrote:
> > Jeff,
> > 
> > Your commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow"),
> > already gone into 3.* stable, is not good.  Could you and your testers
> > please give this alternative a try - I think it should work, and have
> > started it on a few days' memory load on 3.5, but not tried your case.
> 
> I have tested your patch and it does NOT fix the problem in
> http://bugzilla.redhat.com/835019
> Kernel 3.6.0 + your patch => mount goes into a loop when mounting
> a small empty partition.  Please do NOT apply this as it will
> cause a regression!

That was all very helpful information that you provided, thank you.
Sorry, I had missed how "block" is massaged as it's passed down a level.

The patch below fixes it for me, though it does have to change more than
I'd been hoping in such a fix.  Perhaps I am just being silly to resist
repeating the call to blkdev_max_block().  Does this patch work for you?

Thanks,
Hugh

[PATCH] block: replace __getblk_slow misfix by grow_dev_page fix

Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow")
is not good: a successful call to grow_buffers() cannot guarantee
that the page won't be reclaimed before the immediate next call to
__find_get_block(), which is why there was always a loop there.

Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
inode #19278: block 664: comm cc1: unable to read itable block" on console,
which pointed to this commit.

I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
sometimes fails from a missing header file, under memory pressure on
ppc G5.  I've never seen this on x86, and I've never seen it on 3.5-rc7
itself, despite that commit being in there: bisection pointed to an
irrelevant pinctrl merge, but hard to tell when failure takes between
18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).

(I've since found such __ext4_get_inode_loc errors in /var/log/messages
from previous weeks: why the message never appeared on console until
yesterday morning is a mystery for another day.)

Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus
a checkpatch nitfix).  Simplify the interface between grow_buffers()
and grow_dev_page(), and avoid the infinite loop beyond end of device
by instead checking init_page_buffers()'s end_block there (I presume
that's more efficient than a repeated call to blkdev_max_block()),
returning -ENXIO to __getblk_slow() in that case.

And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
comment, but that is worrying: are all users of __getblk() really
now prepared for a NULL bh beyond end of device, or will some oops??

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # 3.0 3.2 3.4 3.5
---

 fs/buffer.c |   66 ++++++++++++++++++++++----------------------------
 1 file changed, 30 insertions(+), 36 deletions(-)

--- 3.6-rc3/fs/buffer.c	2012-08-04 09:19:20.644022328 -0700
+++ linux/fs/buffer.c	2012-08-22 17:11:57.143827063 -0700
@@ -914,7 +914,7 @@ link_dev_buffers(struct page *page, stru
 /*
  * Initialise the state of a blockdev page's buffers.
  */ 
-static void
+static sector_t
 init_page_buffers(struct page *page, struct block_device *bdev,
 			sector_t block, int size)
 {
@@ -936,33 +936,41 @@ init_page_buffers(struct page *page, str
 		block++;
 		bh = bh->b_this_page;
 	} while (bh != head);
+
+	/*
+	 * Caller needs to validate requested block against end of device.
+	 */
+	return end_block;
 }
 
 /*
  * Create the page-cache page that contains the requested block.
  *
- * This is user purely for blockdev mappings.
+ * This is used purely for blockdev mappings.
  */
-static struct page *
+static int
 grow_dev_page(struct block_device *bdev, sector_t block,
-		pgoff_t index, int size)
+		pgoff_t index, int size, int sizebits)
 {
 	struct inode *inode = bdev->bd_inode;
 	struct page *page;
 	struct buffer_head *bh;
+	sector_t end_block;
+	int ret = 0;		/* Will call free_more_memory() */
 
 	page = find_or_create_page(inode->i_mapping, index,
 		(mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE);
 	if (!page)
-		return NULL;
+		return ret;
 
 	BUG_ON(!PageLocked(page));
 
 	if (page_has_buffers(page)) {
 		bh = page_buffers(page);
 		if (bh->b_size == size) {
-			init_page_buffers(page, bdev, block, size);
-			return page;
+			end_block = init_page_buffers(page, bdev,
+						index << sizebits, size);
+			goto done;
 		}
 		if (!try_to_free_buffers(page))
 			goto failed;
@@ -982,14 +990,14 @@ grow_dev_page(struct block_device *bdev,
 	 */
 	spin_lock(&inode->i_mapping->private_lock);
 	link_dev_buffers(page, bh);
-	init_page_buffers(page, bdev, block, size);
+	end_block = init_page_buffers(page, bdev, index << sizebits, size);
 	spin_unlock(&inode->i_mapping->private_lock);
-	return page;
-
+done:
+	ret = (block < end_block) ? 1 : -ENXIO;
 failed:
 	unlock_page(page);
 	page_cache_release(page);
-	return NULL;
+	return ret;
 }
 
 /*
@@ -999,7 +1007,6 @@ failed:
 static int
 grow_buffers(struct block_device *bdev, sector_t block, int size)
 {
-	struct page *page;
 	pgoff_t index;
 	int sizebits;
 
@@ -1023,22 +1030,14 @@ grow_buffers(struct block_device *bdev,
 			bdevname(bdev, b));
 		return -EIO;
 	}
-	block = index << sizebits;
+
 	/* Create a page with the proper size buffers.. */
-	page = grow_dev_page(bdev, block, index, size);
-	if (!page)
-		return 0;
-	unlock_page(page);
-	page_cache_release(page);
-	return 1;
+	return grow_dev_page(bdev, block, index, size, sizebits);
 }
 
 static struct buffer_head *
 __getblk_slow(struct block_device *bdev, sector_t block, int size)
 {
-	int ret;
-	struct buffer_head *bh;
-
 	/* Size must be multiple of hard sectorsize */
 	if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
 			(size < 512 || size > PAGE_SIZE))) {
@@ -1051,21 +1050,20 @@ __getblk_slow(struct block_device *bdev,
 		return NULL;
 	}
 
-retry:
-	bh = __find_get_block(bdev, block, size);
-	if (bh)
-		return bh;
+	for (;;) {
+		struct buffer_head *bh;
+		int ret;
 
-	ret = grow_buffers(bdev, block, size);
-	if (ret == 0) {
-		free_more_memory();
-		goto retry;
-	} else if (ret > 0) {
 		bh = __find_get_block(bdev, block, size);
 		if (bh)
 			return bh;
+
+		ret = grow_buffers(bdev, block, size);
+		if (ret < 0)
+			return NULL;
+		if (ret == 0)
+			free_more_memory();
 	}
-	return NULL;
 }
 
 /*
@@ -1321,10 +1319,6 @@ EXPORT_SYMBOL(__find_get_block);
  * which corresponds to the passed block_device, block and size. The
  * returned buffer has its reference count incremented.
  *
- * __getblk() cannot fail - it just keeps trying.  If you pass it an
- * illegal block number, __getblk() will happily return a buffer_head
- * which represents the non-existent block.  Very weird.
- *
  * __getblk() will lock up the machine if grow_dev_page's try_to_free_buffers()
  * attempt is failing.  FIXME, perhaps?
  */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-23  4:56   ` Hugh Dickins
@ 2012-08-23 10:20     ` Jens Axboe
  2012-08-23 20:40     ` Jeff Moyer
  2012-08-28 11:14     ` Richard W.M. Jones
  2 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2012-08-23 10:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Richard W.M. Jones, Jeff Moyer, Andrew Morton, Linus Torvalds,
	Torsten Hilbrich, Josh Boyer, linux-kernel@vger.kernel.org

On 08/23/2012 06:56 AM, Hugh Dickins wrote:
> [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
> 
> Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow")
> is not good: a successful call to grow_buffers() cannot guarantee
> that the page won't be reclaimed before the immediate next call to
> __find_get_block(), which is why there was always a loop there.
> 
> Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
> inode #19278: block 664: comm cc1: unable to read itable block" on console,
> which pointed to this commit.
> 
> I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
> sometimes fails from a missing header file, under memory pressure on
> ppc G5.  I've never seen this on x86, and I've never seen it on 3.5-rc7
> itself, despite that commit being in there: bisection pointed to an
> irrelevant pinctrl merge, but hard to tell when failure takes between
> 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).
> 
> (I've since found such __ext4_get_inode_loc errors in /var/log/messages
> from previous weeks: why the message never appeared on console until
> yesterday morning is a mystery for another day.)
> 
> Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus
> a checkpatch nitfix).  Simplify the interface between grow_buffers()
> and grow_dev_page(), and avoid the infinite loop beyond end of device
> by instead checking init_page_buffers()'s end_block there (I presume
> that's more efficient than a repeated call to blkdev_max_block()),
> returning -ENXIO to __getblk_slow() in that case.
> 
> And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
> comment, but that is worrying: are all users of __getblk() really
> now prepared for a NULL bh beyond end of device, or will some oops??

Hugh, I tentatively applied this one, awaiting some test feedback before
pushing it upstream this cycle.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-23  4:56   ` Hugh Dickins
  2012-08-23 10:20     ` Jens Axboe
@ 2012-08-23 20:40     ` Jeff Moyer
  2012-08-23 22:46       ` Hugh Dickins
  2012-08-28 11:14     ` Richard W.M. Jones
  2 siblings, 1 reply; 10+ messages in thread
From: Jeff Moyer @ 2012-08-23 20:40 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Richard W.M. Jones, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

Hugh Dickins <hughd@google.com> writes:

> [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
>
> Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow")
> is not good: a successful call to grow_buffers() cannot guarantee
> that the page won't be reclaimed before the immediate next call to
> __find_get_block(), which is why there was always a loop there.
[snip]
> Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus
> a checkpatch nitfix).  Simplify the interface between grow_buffers()
> and grow_dev_page(), and avoid the infinite loop beyond end of device
> by instead checking init_page_buffers()'s end_block there (I presume
> that's more efficient than a repeated call to blkdev_max_block()),
> returning -ENXIO to __getblk_slow() in that case.
>
> And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
> comment, but that is worrying: are all users of __getblk() really
> now prepared for a NULL bh beyond end of device, or will some oops??

Hi, Hugh,

Thanks for digging into this.  I had wondered whether that loop was
required for other purposes, and you've verified that it was.  I mostly
like your fix.  Returning end_block from init_page_buffers is a little
strange, but I understand not wanting to redo the call to
blkdev_max_block.

I went ahead to re-tested with the reproducer that I had, and your patch
works fine.  Thanks again for tracking this down, and sorry I wasn't
more diligent to begin with.

Reviewed-and-Tested-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-23 20:40     ` Jeff Moyer
@ 2012-08-23 22:46       ` Hugh Dickins
  2012-08-24 13:13         ` Jeff Moyer
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2012-08-23 22:46 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Richard W.M. Jones, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

On Thu, 23 Aug 2012, Jeff Moyer wrote:
> Hugh Dickins <hughd@google.com> writes:
> 
> > [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
> >
> > Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow")
> > is not good: a successful call to grow_buffers() cannot guarantee
> > that the page won't be reclaimed before the immediate next call to
> > __find_get_block(), which is why there was always a loop there.
> [snip]
> > Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus
> > a checkpatch nitfix).  Simplify the interface between grow_buffers()
> > and grow_dev_page(), and avoid the infinite loop beyond end of device
> > by instead checking init_page_buffers()'s end_block there (I presume
> > that's more efficient than a repeated call to blkdev_max_block()),
> > returning -ENXIO to __getblk_slow() in that case.
> >
> > And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
> > comment, but that is worrying: are all users of __getblk() really
> > now prepared for a NULL bh beyond end of device, or will some oops??
> 
> Hi, Hugh,
> 
> Thanks for digging into this.  I had wondered whether that loop was
> required for other purposes, and you've verified that it was.  I mostly
> like your fix.  Returning end_block from init_page_buffers is a little
> strange,

It is a little strange, I agree.  I wrestled a lot with the way block
and end_block were known at opposite ends of the callstack, and needed
to be brought together for the check.

If you think it would be better to move the blkdev_max_block()
lookup up a level into grow_dev_page(), then pass end_block down to
init_page_buffers(), that could work as well; though we'd then still
need to report "failure" from init_page_buffers().

I was inclined to leave it precisely where you had sited it, with that
comment on passing it back up; but could change it around if preferred.

> but I understand not wanting to redo the call to blkdev_max_block.
> 
> I went ahead to re-tested with the reproducer that I had, and your patch
> works fine.  Thanks again for tracking this down, and sorry I wasn't
> more diligent to begin with.
> 
> Reviewed-and-Tested-by: Jeff Moyer <jmoyer@redhat.com>

Great, thanks a lot.  It (but equally its incorrect earlier version)
have been running fine at home under my swapping load.  I'm confident
that it fixes the issues I had been seeing, although, strictly speaking,
it'll take another two or three days to reach bisection confidence level.

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-23 22:46       ` Hugh Dickins
@ 2012-08-24 13:13         ` Jeff Moyer
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Moyer @ 2012-08-24 13:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Richard W.M. Jones, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

Hugh Dickins <hughd@google.com> writes:

> On Thu, 23 Aug 2012, Jeff Moyer wrote:
>> Hugh Dickins <hughd@google.com> writes:
>> 
>> > [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
>> >
>> > Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow")
>> > is not good: a successful call to grow_buffers() cannot guarantee
>> > that the page won't be reclaimed before the immediate next call to
>> > __find_get_block(), which is why there was always a loop there.
>> [snip]
>> > Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus
>> > a checkpatch nitfix).  Simplify the interface between grow_buffers()
>> > and grow_dev_page(), and avoid the infinite loop beyond end of device
>> > by instead checking init_page_buffers()'s end_block there (I presume
>> > that's more efficient than a repeated call to blkdev_max_block()),
>> > returning -ENXIO to __getblk_slow() in that case.
>> >
>> > And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
>> > comment, but that is worrying: are all users of __getblk() really
>> > now prepared for a NULL bh beyond end of device, or will some oops??
>> 
>> Hi, Hugh,
>> 
>> Thanks for digging into this.  I had wondered whether that loop was
>> required for other purposes, and you've verified that it was.  I mostly
>> like your fix.  Returning end_block from init_page_buffers is a little
>> strange,
>
> It is a little strange, I agree.  I wrestled a lot with the way block
> and end_block were known at opposite ends of the callstack, and needed
> to be brought together for the check.
>
> If you think it would be better to move the blkdev_max_block()
> lookup up a level into grow_dev_page(), then pass end_block down to
> init_page_buffers(), that could work as well; though we'd then still
> need to report "failure" from init_page_buffers().

I have no strong opinion;  I can live with it as-is.

> Great, thanks a lot.  It (but equally its incorrect earlier version)
> have been running fine at home under my swapping load.  I'm confident
> that it fixes the issues I had been seeing, although, strictly speaking,
> it'll take another two or three days to reach bisection confidence level.

Well, I bought your analysis, so I'm at least ready to see this get
pushed. ;-)

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-23  4:56   ` Hugh Dickins
  2012-08-23 10:20     ` Jens Axboe
  2012-08-23 20:40     ` Jeff Moyer
@ 2012-08-28 11:14     ` Richard W.M. Jones
  2012-08-28 14:35       ` Hugh Dickins
  2 siblings, 1 reply; 10+ messages in thread
From: Richard W.M. Jones @ 2012-08-28 11:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jeff Moyer, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

On Wed, Aug 22, 2012 at 09:56:12PM -0700, Hugh Dickins wrote:
> On Wed, 22 Aug 2012, Richard W.M. Jones wrote:
> > On Tue, Aug 21, 2012 at 06:33:45PM -0700, Hugh Dickins wrote:
> > > Jeff,
> > > 
> > > Your commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow"),
> > > already gone into 3.* stable, is not good.  Could you and your testers
> > > please give this alternative a try - I think it should work, and have
> > > started it on a few days' memory load on 3.5, but not tried your case.
> > 
> > I have tested your patch and it does NOT fix the problem in
> > http://bugzilla.redhat.com/835019
> > Kernel 3.6.0 + your patch => mount goes into a loop when mounting
> > a small empty partition.  Please do NOT apply this as it will
> > cause a regression!
> 
> That was all very helpful information that you provided, thank you.
> Sorry, I had missed how "block" is massaged as it's passed down a level.
> 
> The patch below fixes it for me, though it does have to change more than
> I'd been hoping in such a fix.  Perhaps I am just being silly to resist
> repeating the call to blkdev_max_block().  Does this patch work for you?
> 
> Thanks,
> Hugh
> 
> [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix

I noticed this (second version) went upstream already.  Nevertheless I
tested it today and it doesn't cause a regression of RHBZ#835019.

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
  2012-08-28 11:14     ` Richard W.M. Jones
@ 2012-08-28 14:35       ` Hugh Dickins
  0 siblings, 0 replies; 10+ messages in thread
From: Hugh Dickins @ 2012-08-28 14:35 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Jeff Moyer, Andrew Morton, Linus Torvalds, Jens Axboe,
	Torsten Hilbrich, Josh Boyer, linux-kernel

On Tue, 28 Aug 2012, Richard W.M. Jones wrote:
> On Wed, Aug 22, 2012 at 09:56:12PM -0700, Hugh Dickins wrote:
> > [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix
> 
> I noticed this (second version) went upstream already.  Nevertheless I
> tested it today and it doesn't cause a regression of RHBZ#835019.

Thank you for testing and confirming, Rich.
Once Jeff had tested it, we felt safe to go ahead while you were off.

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-28 14:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-22  1:33 [PATCH] block: replace __getblk_slow misfix by grow_dev_page fix Hugh Dickins
2012-08-22  9:12 ` Richard W.M. Jones
2012-08-22 11:52 ` Richard W.M. Jones
2012-08-23  4:56   ` Hugh Dickins
2012-08-23 10:20     ` Jens Axboe
2012-08-23 20:40     ` Jeff Moyer
2012-08-23 22:46       ` Hugh Dickins
2012-08-24 13:13         ` Jeff Moyer
2012-08-28 11:14     ` Richard W.M. Jones
2012-08-28 14:35       ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.