cramfs corruption after BLKFLSBUF on loop device

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* cramfs corruption after BLKFLSBUF on loop device
@ 2006-05-29 21:40 Olaf Hering
  2006-05-30 13:19 ` Olaf Hering
  2006-05-30 18:24 ` Olaf Hering
  0 siblings, 2 replies; 23+ messages in thread
From: Olaf Hering @ 2006-05-29 21:40 UTC (permalink / raw)
  To: linux-kernel

This script will cause cramfs decompression errors, on SMP at least:

#!/bin/bash
while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
while :;do ps faxs  </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do dmesg    </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done

...
Error -3 while decompressing!
c0000000009592a2(2649)->c0000000edf87000(4096)
Error -3 while decompressing!
c000000000959298(2520)->c0000000edbc7000(4096)
Error -3 while decompressing!
c000000000959c70(2489)->c0000000f1482000(4096)
Error -3 while decompressing!
c00000000095a629(2355)->c0000000edaff000(4096)
Error -3 while decompressing!
...

evms_access does the ioctl (lots of them) on the loop device.
Its a long standing bug, 2.6.5 fails as well. cramfs_read() clears parts
of the src buffer because the page is not uptodate. invalidate_bdev()
touched the page last.
cramfs_read() was called from line 480 or 490 when the
PageUptodate(page) test fails.

...
    464 static int cramfs_readpage(struct file *file, struct page * page)
..
    479                 if (page->index)
    480                         start_offset = *(u32 *) cramfs_read(sb, blkptr_offset-4, 4);
..
    488                         bytes_filled = cramfs_uncompress_block(pgdata,
    489                                  PAGE_CACHE_SIZE,
    490                                  cramfs_read(sb, start_offset, compr_len),
    491                                  compr_len);
...

There are rumors that cramfs is not smp safe...
Maybe the only hope is to tell evms to not do that ioctl for loop.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cramfs corruption after BLKFLSBUF on loop device
  2006-05-29 21:40 cramfs corruption after BLKFLSBUF on loop device Olaf Hering
@ 2006-05-30 13:19 ` Olaf Hering
  2006-05-30 18:24 ` Olaf Hering
  1 sibling, 0 replies; 23+ messages in thread
From: Olaf Hering @ 2006-05-30 13:19 UTC (permalink / raw)
  To: linux-kernel

 On Mon, May 29, Olaf Hering wrote:

> This script will cause cramfs decompression errors, on SMP at least:
> 
> #!/bin/bash
> while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
> while :;do ps faxs  </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do dmesg    </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done
> 
> ...
> Error -3 while decompressing!
> c0000000009592a2(2649)->c0000000edf87000(4096)
> Error -3 while decompressing!
> c000000000959298(2520)->c0000000edbc7000(4096)
> Error -3 while decompressing!
> c000000000959c70(2489)->c0000000f1482000(4096)
> Error -3 while decompressing!
> c00000000095a629(2355)->c0000000edaff000(4096)
> Error -3 while decompressing!
> ...

This change works for me, the added BUG() does not trigger.
read_cache_page() returns the page in PageUptodate() state.
But a few ticks later, invalidate_complete_page() calls ClearPageUptodate(),
on another cpu.
The SetPageDirty() works for my testcase, but not without the mb().
Does anyone know what sideeffects the SetPageDirty() has for the
loopmounted cramfs?



---
 fs/cramfs/inode.c      |    2 ++
 fs/cramfs/uncompress.c |    1 +
 2 files changed, 3 insertions(+)

Index: linux-2.6.16.16-1.6/fs/cramfs/inode.c
===================================================================
--- linux-2.6.16.16-1.6.orig/fs/cramfs/inode.c
+++ linux-2.6.16.16-1.6/fs/cramfs/inode.c
@@ -186,6 +186,8 @@ static void *cramfs_read(struct super_bl
 			/* synchronous error? */
 			if (IS_ERR(page))
 				page = NULL;
+			SetPageDirty(page);
+			mb();
 		}
 		pages[i] = page;
 	}
Index: linux-2.6.16.16-1.6/fs/cramfs/uncompress.c
===================================================================
--- linux-2.6.16.16-1.6.orig/fs/cramfs/uncompress.c
+++ linux-2.6.16.16-1.6/fs/cramfs/uncompress.c
@@ -50,6 +50,7 @@ int cramfs_uncompress_block(void *dst, i
 err:
 	printk("Error %d while decompressing!\n", err);
 	printk("%p(%d)->%p(%d)\n", src, srclen, dst, dstlen);
+	BUG_ON(1);
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cramfs corruption after BLKFLSBUF on loop device
  2006-05-29 21:40 cramfs corruption after BLKFLSBUF on loop device Olaf Hering
  2006-05-30 13:19 ` Olaf Hering
@ 2006-05-30 18:24 ` Olaf Hering
  2006-06-01 18:49   ` [PATCH] " Olaf Hering
  1 sibling, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-05-30 18:24 UTC (permalink / raw)
  To: linux-kernel, Al Viro

 On Mon, May 29, Olaf Hering wrote:

> This script will cause cramfs decompression errors, on SMP at least:
> 
> #!/bin/bash
> while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
> while :;do ps faxs  </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do dmesg    </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done
> 
> ...
> Error -3 while decompressing!
> c0000000009592a2(2649)->c0000000edf87000(4096)
> Error -3 while decompressing!
> c000000000959298(2520)->c0000000edbc7000(4096)
> Error -3 while decompressing!
> c000000000959c70(2489)->c0000000f1482000(4096)
> Error -3 while decompressing!
> c00000000095a629(2355)->c0000000edaff000(4096)
> Error -3 while decompressing!
> ...
> 
> evms_access does the ioctl (lots of them) on the loop device.
> Its a long standing bug, 2.6.5 fails as well. cramfs_read() clears parts
> of the src buffer because the page is not uptodate. invalidate_bdev()
> touched the page last.
> cramfs_read() was called from line 480 or 490 when the
> PageUptodate(page) test fails.

Al, you added the PageUptodate check for 2.6.2.

http://linux.bkbits.net:8080/linux-2.6/gnupatch@400c1cddyzRoKomOj57xxUAmKnMbZQ

Should there be some locking for blockdev --flushbufs, or is the check
just bogus? 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-05-30 18:24 ` Olaf Hering
@ 2006-06-01 18:49   ` Olaf Hering
  2006-06-01 19:12     ` Andrew Morton
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Olaf Hering @ 2006-06-01 18:49 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Al Viro


This script will cause cramfs decompression errors, on SMP at least:

#!/bin/bash                                                                                                                                                          
while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
while :;do ps faxs  </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do dmesg    </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done

(The used executables come from the symlinked /mounts/instsys directory)

...
Error -3 while decompressing!
c0000000009592a2(2649)->c0000000edf87000(4096)
Error -3 while decompressing!
c000000000959298(2520)->c0000000edbc7000(4096)
Error -3 while decompressing!
c000000000959c70(2489)->c0000000f1482000(4096) 
Error -3 while decompressing!
c00000000095a629(2355)->c0000000edaff000(4096)
Error -3 while decompressing!
...

Its a long standing bug, introduced in 2.6.2.

cramfs_read() clears parts of the src buffer because the page is not uptodate.
invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set ClearPageUptodate()
after cramfs_read() got the page from read_cache_page()
If PageUptodate() fails, read the page again before using it.
There is still a small window were the page may not be uptodate before copying
its contents away.

evms_access does the BLKFLSBUF ioctl (lots of them) on the loop device. This will
corrupt the SuSE installation image on SMP kernels, leading to random segfaults.

Signed-off-by: Olaf Hering <olh@suse.de>

---
 fs/cramfs/inode.c |   70 ++++++++++++++++++++++++------------------------------
 1 file changed, 32 insertions(+), 38 deletions(-)

Index: linux-2.6.16.16-1.6/fs/cramfs/inode.c
===================================================================
--- linux-2.6.16.16-1.6.orig/fs/cramfs/inode.c
+++ linux-2.6.16.16-1.6/fs/cramfs/inode.c
@@ -147,8 +147,8 @@ static int next_buffer;
 static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
 {
 	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
-	struct page *pages[BLKS_PER_BUF];
-	unsigned i, blocknr, buffer, unread;
+	struct page *page;
+	unsigned i, blocknr, buffer, readagain;
 	unsigned long devsize;
 	char *data;
 
@@ -174,48 +174,42 @@ static void *cramfs_read(struct super_bl
 
 	devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT;
 
-	/* Ok, read in BLKS_PER_BUF pages completely first. */
-	unread = 0;
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = NULL;
-
-		if (blocknr + i < devsize) {
-			page = read_cache_page(mapping, blocknr + i,
-				(filler_t *)mapping->a_ops->readpage,
-				NULL);
-			/* synchronous error? */
-			if (IS_ERR(page))
-				page = NULL;
-		}
-		pages[i] = page;
-	}
-
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
-		if (page) {
-			wait_on_page_locked(page);
-			if (!PageUptodate(page)) {
-				/* asynchronous error */
-				page_cache_release(page);
-				pages[i] = NULL;
-			}
-		}
-	}
-
 	buffer = next_buffer;
 	next_buffer = NEXT_BUFFER(buffer);
 	buffer_blocknr[buffer] = blocknr;
 	buffer_dev[buffer] = sb;
-
 	data = read_buffers[buffer];
+
 	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
-		if (page) {
-			memcpy(data, kmap(page), PAGE_CACHE_SIZE);
-			kunmap(page);
-			page_cache_release(page);
-		} else
-			memset(data, 0, PAGE_CACHE_SIZE);
+		if (blocknr + i < devsize) {
+			page = NULL;
+			for (readagain = 0; readagain < 5; readagain++) {
+				page = read_cache_page(mapping, blocknr + i,
+					(filler_t *)mapping->a_ops->readpage,
+					NULL);
+				/* synchronous error? */
+				if (IS_ERR(page)) {
+					page = NULL;
+					break;
+				}
+				wait_on_page_locked(page);
+				if (PageUptodate(page))
+					break;
+				/* asynchronous error */
+				/* maybe BLKFLSBUF flushed the page */
+				page_cache_release(page);
+				page = NULL;
+			}
+			if (page) {
+				memcpy(data, kmap(page), PAGE_CACHE_SIZE);
+				kunmap(page);
+				page_cache_release(page);
+			} else
+				memset(data, 0, PAGE_CACHE_SIZE);
+			if (readagain)
+				printk(KERN_DEBUG "cramfs_read got %s Uptodate page after %d attempt(s)\n",
+						page ? "an" : "no", readagain);
+		}
 		data += PAGE_CACHE_SIZE;
 	}
 	return read_buffers[buffer] + offset;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 18:49   ` [PATCH] " Olaf Hering
@ 2006-06-01 19:12     ` Andrew Morton
  2006-06-01 19:15       ` Andrew Morton
  2006-06-01 20:10       ` Olaf Hering
  2006-06-01 20:17     ` Chris Mason
  2006-09-20 13:20     ` Olaf Hering
  2 siblings, 2 replies; 23+ messages in thread
From: Andrew Morton @ 2006-06-01 19:12 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, viro

On Thu, 1 Jun 2006 20:49:38 +0200
Olaf Hering <olh@suse.de> wrote:

> 
> This script will cause cramfs decompression errors, on SMP at least:
> 
> #!/bin/bash                                                                                                                                                          
> while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
> while :;do ps faxs  </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do dmesg    </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done
> 
> (The used executables come from the symlinked /mounts/instsys directory)
> 
> ...
> Error -3 while decompressing!
> c0000000009592a2(2649)->c0000000edf87000(4096)
> Error -3 while decompressing!
> c000000000959298(2520)->c0000000edbc7000(4096)
> Error -3 while decompressing!
> c000000000959c70(2489)->c0000000f1482000(4096) 
> Error -3 while decompressing!
> c00000000095a629(2355)->c0000000edaff000(4096)
> Error -3 while decompressing!
> ...
> 
> Its a long standing bug, introduced in 2.6.2.
> 
> cramfs_read() clears parts of the src buffer because the page is not uptodate.
> invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set ClearPageUptodate()
> after cramfs_read() got the page from read_cache_page()
> If PageUptodate() fails, read the page again before using it.
> There is still a small window were the page may not be uptodate before copying
> its contents away.
> 
> evms_access does the BLKFLSBUF ioctl (lots of them) on the loop device. This will
> corrupt the SuSE installation image on SMP kernels, leading to random segfaults.
> 

OK, invalidate_inode_pages().

> +
>  	for (i = 0; i < BLKS_PER_BUF; i++) {
> -		struct page *page = pages[i];
> -		if (page) {
> -			memcpy(data, kmap(page), PAGE_CACHE_SIZE);
> -			kunmap(page);
> -			page_cache_release(page);
> -		} else
> -			memset(data, 0, PAGE_CACHE_SIZE);
> +		if (blocknr + i < devsize) {
> +			page = NULL;
> +			for (readagain = 0; readagain < 5; readagain++) {
> +				page = read_cache_page(mapping, blocknr + i,
> +					(filler_t *)mapping->a_ops->readpage,
> +					NULL);
> +				/* synchronous error? */
> +				if (IS_ERR(page)) {
> +					page = NULL;
> +					break;
> +				}
> +				wait_on_page_locked(page);
> +				if (PageUptodate(page))
> +					break;
> +				/* asynchronous error */
> +				/* maybe BLKFLSBUF flushed the page */
> +				page_cache_release(page);
> +				page = NULL;
> +			}
> +			if (page) {
> +				memcpy(data, kmap(page), PAGE_CACHE_SIZE);
> +				kunmap(page);
> +				page_cache_release(page);
> +			} else
> +				memset(data, 0, PAGE_CACHE_SIZE);
> +			if (readagain)
> +				printk(KERN_DEBUG "cramfs_read got %s Uptodate page after %d attempt(s)\n",
> +						page ? "an" : "no", readagain);
> +		}

This code absolutely needs a comment telling the poor reader what it's in
there for.

It's still racy.  Do the memcpy while the page is locked:


retry:
	read_cache_page()
	lock_page()
	if (!PageUptodate()) {
		if (readagain-- != 0)
			goto retry;
		give_up();
	}
	memcpy();
	unlock_page();

Also, let's use kmap_atomic() while we're in there.

Of course, this will all just fail less often than it presently does.  We'd
be better off taking a lock if poss to keep the ioctl away.  I'd have
thought that it'd be appropriate to take i_mutex while running
invalidate_inode_pages().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 19:12     ` Andrew Morton
@ 2006-06-01 19:15       ` Andrew Morton
  2006-06-01 20:10       ` Olaf Hering
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2006-06-01 19:15 UTC (permalink / raw)
  To: olh, linux-kernel, viro

On Thu, 1 Jun 2006 12:12:00 -0700
Andrew Morton <akpm@osdl.org> wrote:

> I'd have
> thought that it'd be appropriate to take i_mutex while running
> invalidate_inode_pages().

Let me add: it'd be appropriate, but also difficult -
invalidate_inode_pages() is called from all sorts of contexts, including
inside spinlock.

A quickie fix would be to take i_mutex at the top level, on entry to the
ioctl.  But that'll only fix the BLKFLSBUF case.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 19:12     ` Andrew Morton
  2006-06-01 19:15       ` Andrew Morton
@ 2006-06-01 20:10       ` Olaf Hering
  2006-06-01 21:24         ` Andrew Morton
  1 sibling, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-06-01 20:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro


This script will cause cramfs decompression errors, on SMP at least:

#!/bin/bash                                                                                                                                                          
while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
while :;do ps faxs  </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do dmesg    </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done

(The used executables come from the symlinked /mounts/instsys directory)

...
Error -3 while decompressing!
c0000000009592a2(2649)->c0000000edf87000(4096)
Error -3 while decompressing!
c000000000959298(2520)->c0000000edbc7000(4096)
Error -3 while decompressing!
c000000000959c70(2489)->c0000000f1482000(4096) 
Error -3 while decompressing!
c00000000095a629(2355)->c0000000edaff000(4096)
Error -3 while decompressing!
...

Its a long standing bug, introduced in 2.6.2.

cramfs_read() clears parts of the src buffer because the page is not uptodate.
invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set ClearPageUptodate()
after cramfs_read() got the page from read_cache_page()
If PageUptodate() fails, read the page again before using it.

evms_access does the BLKFLSBUF ioctl (lots of them) on the loop device. This will
corrupt the SuSE installation image on SMP kernels, leading to random segfaults.

Signed-off-by: Olaf Hering <olh@suse.de>

---
 fs/cramfs/inode.c |   70 ++++++++++++++++++++++++------------------------------
 1 file changed, 32 insertions(+), 38 deletions(-)

Index: linux-2.6.16.16-1.6/fs/cramfs/inode.c
===================================================================
--- linux-2.6.16.16-1.6.orig/fs/cramfs/inode.c
+++ linux-2.6.16.16-1.6/fs/cramfs/inode.c
@@ -140,6 +140,25 @@ static unsigned buffer_blocknr[READ_BUFF
 static struct super_block * buffer_dev[READ_BUFFERS];
 static int next_buffer;
 
+/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */
+static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n)
+{
+	struct page *page;
+	int readagain = 5;
+retry:
+	page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL);
+	if (IS_ERR(page))
+		return NULL;
+	lock_page(page);
+	if (PageUptodate(page))
+		return page;
+	unlock_page(page);
+	page_cache_release(page);
+	if (readagain--)
+		goto retry;
+	return NULL;
+}
+
 /*
  * Returns a pointer to a buffer containing at least LEN bytes of
  * filesystem starting at byte offset OFFSET into the filesystem.
@@ -147,8 +166,8 @@ static int next_buffer;
 static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
 {
 	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
-	struct page *pages[BLKS_PER_BUF];
-	unsigned i, blocknr, buffer, unread;
+	struct page *page;
+	unsigned i, blocknr, buffer;
 	unsigned long devsize;
 	char *data;
 
@@ -174,48 +193,23 @@ static void *cramfs_read(struct super_bl
 
 	devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT;
 
-	/* Ok, read in BLKS_PER_BUF pages completely first. */
-	unread = 0;
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = NULL;
-
-		if (blocknr + i < devsize) {
-			page = read_cache_page(mapping, blocknr + i,
-				(filler_t *)mapping->a_ops->readpage,
-				NULL);
-			/* synchronous error? */
-			if (IS_ERR(page))
-				page = NULL;
-		}
-		pages[i] = page;
-	}
-
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
-		if (page) {
-			wait_on_page_locked(page);
-			if (!PageUptodate(page)) {
-				/* asynchronous error */
-				page_cache_release(page);
-				pages[i] = NULL;
-			}
-		}
-	}
-
 	buffer = next_buffer;
 	next_buffer = NEXT_BUFFER(buffer);
 	buffer_blocknr[buffer] = blocknr;
 	buffer_dev[buffer] = sb;
-
 	data = read_buffers[buffer];
+
 	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
-		if (page) {
-			memcpy(data, kmap(page), PAGE_CACHE_SIZE);
-			kunmap(page);
-			page_cache_release(page);
-		} else
-			memset(data, 0, PAGE_CACHE_SIZE);
+		if (blocknr + i < devsize) {
+			page = cramfs_read_cache_page(mapping, blocknr + i);
+			if (page) {
+				memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE);
+				kunmap(page);
+				unlock_page(page);
+				page_cache_release(page);
+			} else
+				memset(data, 0, PAGE_CACHE_SIZE);
+		}
 		data += PAGE_CACHE_SIZE;
 	}
 	return read_buffers[buffer] + offset;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 20:10       ` Olaf Hering
@ 2006-06-01 21:24         ` Andrew Morton
  2006-06-01 21:41           ` Olaf Hering
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2006-06-01 21:24 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, viro

Olaf Hering <olh@suse.de> wrote:
>
> 
>  
> +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */
> +static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n)
> +{
> +	struct page *page;
> +	int readagain = 5;
> +retry:
> +	page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL);
> +	if (IS_ERR(page))
> +		return NULL;
> +	lock_page(page);
> +	if (PageUptodate(page))
> +		return page;
> +	unlock_page(page);
> +	page_cache_release(page);
> +	if (readagain--)
> +		goto retry;
> +	return NULL;
> +}

Better, but it's still awful, isn't it?  The things you were discussing
with Chris look more promising.  PG_Dirty would be a bit of a hack, but at
least it'd be a 100% reliable hack, whereas the above is a
whatever-the-previous-failure-rate-was-to-the-fifth hack.

> +			page = cramfs_read_cache_page(mapping, blocknr + i);
> +			if (page) {
> +				memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE);
> +				kunmap(page);

kunmap_atomic, please.

> +				unlock_page(page);
> +				page_cache_release(page);
> +			} else
> +				memset(data, 0, PAGE_CACHE_SIZE);
> +		}
>  		data += PAGE_CACHE_SIZE;
>  	}
>  	return read_buffers[buffer] + offset;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 21:24         ` Andrew Morton
@ 2006-06-01 21:41           ` Olaf Hering
  2006-06-01 21:57             ` Andrew Morton
  0 siblings, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-06-01 21:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro

 On Thu, Jun 01, Andrew Morton wrote:

> Olaf Hering <olh@suse.de> wrote:
> >
> > 
> >  
> > +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */
> > +static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n)
> > +{
> > +	struct page *page;
> > +	int readagain = 5;
> > +retry:
> > +	page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL);
> > +	if (IS_ERR(page))
> > +		return NULL;
> > +	lock_page(page);
> > +	if (PageUptodate(page))
> > +		return page;
> > +	unlock_page(page);
> > +	page_cache_release(page);
> > +	if (readagain--)
> > +		goto retry;
> > +	return NULL;
> > +}
> 
> Better, but it's still awful, isn't it?  The things you were discussing
> with Chris look more promising.  PG_Dirty would be a bit of a hack, but at
> least it'd be a 100% reliable hack, whereas the above is a
> whatever-the-previous-failure-rate-was-to-the-fifth hack.

Do you want it like that?

lock_page(page);
if (PageUptodate(page)) {
        SetPageDirty(page);
        mb();
        return page;
}

and perhaps a ClearPageDirty() after memcpy.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 21:41           ` Olaf Hering
@ 2006-06-01 21:57             ` Andrew Morton
  2006-06-02  8:43               ` Olaf Hering
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2006-06-01 21:57 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, viro

Olaf Hering <olh@suse.de> wrote:
>
>  On Thu, Jun 01, Andrew Morton wrote:
> 
> > Olaf Hering <olh@suse.de> wrote:
> > >
> > > 
> > >  
> > > +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */
> > > +static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n)
> > > +{
> > > +	struct page *page;
> > > +	int readagain = 5;
> > > +retry:
> > > +	page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL);
> > > +	if (IS_ERR(page))
> > > +		return NULL;
> > > +	lock_page(page);
> > > +	if (PageUptodate(page))
> > > +		return page;
> > > +	unlock_page(page);
> > > +	page_cache_release(page);
> > > +	if (readagain--)
> > > +		goto retry;
> > > +	return NULL;
> > > +}
> > 
> > Better, but it's still awful, isn't it?  The things you were discussing
> > with Chris look more promising.  PG_Dirty would be a bit of a hack, but at
> > least it'd be a 100% reliable hack, whereas the above is a
> > whatever-the-previous-failure-rate-was-to-the-fifth hack.
> 
> Do you want it like that?
> 
> lock_page(page);
> if (PageUptodate(page)) {
>         SetPageDirty(page);
>         mb();
>         return page;
> }

Not really ;)  It's hacky.  It'd be better to take a lock.

I expect it'd work though, as long as...

> and perhaps a ClearPageDirty() after memcpy.

I assume this is a read-only filesystem?  I mean, if someone had really
tried to dirty the page in the meanwhile via, say, munmap or msync which
don't lock the page, we just lost their data.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 21:57             ` Andrew Morton
@ 2006-06-02  8:43               ` Olaf Hering
  2006-06-02  9:11                 ` Andrew Morton
  0 siblings, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-06-02  8:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro

 On Thu, Jun 01, Andrew Morton wrote:


> > Do you want it like that?
> > 
> > lock_page(page);
> > if (PageUptodate(page)) {
> >         SetPageDirty(page);
> >         mb();
> >         return page;
> > }
> 
> Not really ;)  It's hacky.  It'd be better to take a lock.

Which lock exactly? I'm not sure how to proceed from here.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-02  8:43               ` Olaf Hering
@ 2006-06-02  9:11                 ` Andrew Morton
  2006-06-02 19:14                   ` Olaf Hering
  2006-06-02 19:37                   ` Olaf Hering
  0 siblings, 2 replies; 23+ messages in thread
From: Andrew Morton @ 2006-06-02  9:11 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, viro

On Fri, 2 Jun 2006 10:43:27 +0200
Olaf Hering <olh@suse.de> wrote:

>  On Thu, Jun 01, Andrew Morton wrote:
> 
> 
> > > Do you want it like that?
> > > 
> > > lock_page(page);
> > > if (PageUptodate(page)) {
> > >         SetPageDirty(page);
> > >         mb();
> > >         return page;
> > > }
> > 
> > Not really ;)  It's hacky.  It'd be better to take a lock.
> 
> Which lock exactly?

Ah, sorry, there isn't such a lock.  I was just carrying on.

> I'm not sure how to proceed from here.

I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked
page and implement a_ops.releasepage(), which will fail if PageChecked(),
and will succeed otherwise:

/*
 * cramfs_releasepage() will fail if cramfs_read() set PG_checked.  This
 * is so that invalidate_inode_pages() cannot zap the page while
 * cramfs_read() is trying to get at its contents.
 */
cramfs_releasepage(...)
{
	if (PageChecked(page))
		return 0;
	return 1;
}


cramfs_read(...)
{
	lock_page(page);
	SetPagePrivate(page);
	SetPageChecked(page);
	read_mapping_page(...);
	lock_page(page);
	if (page->mapping == NULL) {
		/* truncate got there first */
		unlock_page(page);
		bale();
	}
	memcpy();
	ClearPageChecked(page);
	ClearPagePrivate(page);
	unlock_page(page);
}

PG_checked is a filesystem-private flag.  It'll soon be renamed to
PG_fs_misc.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-02  9:11                 ` Andrew Morton
@ 2006-06-02 19:14                   ` Olaf Hering
  2006-06-02 19:41                     ` Andrew Morton
  2006-06-02 19:37                   ` Olaf Hering
  1 sibling, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-06-02 19:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro

 On Fri, Jun 02, Andrew Morton wrote:

> I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked
> page and implement a_ops.releasepage(), which will fail if PageChecked(),
> and will succeed otherwise:
> 
> /*
>  * cramfs_releasepage() will fail if cramfs_read() set PG_checked.  This
>  * is so that invalidate_inode_pages() cannot zap the page while
>  * cramfs_read() is trying to get at its contents.
>  */
> cramfs_releasepage(...)
> {
> 	if (PageChecked(page))
> 		return 0;
> 	return 1;
> }

This function is not triggered in my testing.

> cramfs_read(...)
> {
> 	lock_page(page);
> 	SetPagePrivate(page);
> 	SetPageChecked(page);
> 	read_mapping_page(...);
> 	lock_page(page);
> 	if (page->mapping == NULL) {
> 		/* truncate got there first */
> 		unlock_page(page);
> 		bale();
> 	}
> 	memcpy();
> 	ClearPageChecked(page);
> 	ClearPagePrivate(page);
> 	unlock_page(page);
> }

recursive recursion? Where is page supposed to come from? Did you mean
to call all this with a page returned from read_mapping_page(), and
call read_mapping_page() twice?

My version seems to leak memory somehow, Inactive and slab buffer_head
keeps growing. I guess something is missing in the picture.

Doesnt read_cache_page lock the page already?
read_cache_page
  __read_cache_page
    add_to_page_cache_lru
      add_to_page_cache
        SetPageLocked

--- /tmp/meminfo-1149269715     2006-06-02 13:35:15.439683381 -0400
+++ x   2006-06-02 13:53:20.379683381 -0400
@@ -1,23 +1,23 @@
 MemTotal:      3952952 kB
-MemFree:       3395708 kB
+MemFree:       2966276 kB
 Buffers:             0 kB
-Cached:         328960 kB
+Cached:         326904 kB
 SwapCached:          0 kB
-Active:         366884 kB
-Inactive:       111496 kB
+Active:         381720 kB
+Inactive:       518188 kB
 HighTotal:           0 kB
 HighFree:            0 kB
 LowTotal:      3952952 kB
-LowFree:       3395708 kB
+LowFree:       2966276 kB
 SwapTotal:           0 kB
 SwapFree:            0 kB
 Dirty:               0 kB
 Writeback:           0 kB
-Mapped:          56320 kB
-Slab:            60608 kB
+Mapped:          60440 kB
+Slab:            68704 kB
 CommitLimit:   1976476 kB
-Committed_AS:   155304 kB
-PageTables:       1216 kB
+Committed_AS:   158780 kB
+PageTables:       1112 kB
 VmallocTotal: 8589934592 kB
 VmallocUsed:      2452 kB
 VmallocChunk: 8589932068 kB


diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 9efcc3a..1266b8e 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -141,6 +141,35 @@ static unsigned buffer_blocknr[READ_BUFF
 static struct super_block * buffer_dev[READ_BUFFERS];
 static int next_buffer;
 
+static void cramfs_read_putpage(struct page *page)
+{
+	ClearPageChecked(page);
+	ClearPagePrivate(page);
+	unlock_page(page);
+	page_cache_release(page);
+}
+
+/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */
+static struct page *cramfs_read_getpage(struct address_space *m, unsigned int n)
+{
+	struct page *page;
+	int readagain = 5;
+retry:
+	page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL);
+	if (IS_ERR(page))
+		return NULL;
+	lock_page(page);
+	SetPagePrivate(page);
+	SetPageChecked(page);
+	if (PageUptodate(page))
+		return page;
+	cramfs_read_putpage(page);
+	printk("cramfs_read_getpage %d %p\n", 5 - readagain + 1, page);
+	if (readagain--)
+		goto retry;
+	return NULL;
+}
+
 /*
  * Returns a pointer to a buffer containing at least LEN bytes of
  * filesystem starting at byte offset OFFSET into the filesystem.
@@ -148,8 +177,8 @@ static int next_buffer;
 static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
 {
 	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
-	struct page *pages[BLKS_PER_BUF];
-	unsigned i, blocknr, buffer, unread;
+	struct page *page;
+	unsigned i, blocknr, buffer;
 	unsigned long devsize;
 	char *data;
 
@@ -175,48 +204,22 @@ static void *cramfs_read(struct super_bl
 
 	devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT;
 
-	/* Ok, read in BLKS_PER_BUF pages completely first. */
-	unread = 0;
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = NULL;
-
-		if (blocknr + i < devsize) {
-			page = read_cache_page(mapping, blocknr + i,
-				(filler_t *)mapping->a_ops->readpage,
-				NULL);
-			/* synchronous error? */
-			if (IS_ERR(page))
-				page = NULL;
-		}
-		pages[i] = page;
-	}
-
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
-		if (page) {
-			wait_on_page_locked(page);
-			if (!PageUptodate(page)) {
-				/* asynchronous error */
-				page_cache_release(page);
-				pages[i] = NULL;
-			}
-		}
-	}
-
 	buffer = next_buffer;
 	next_buffer = NEXT_BUFFER(buffer);
 	buffer_blocknr[buffer] = blocknr;
 	buffer_dev[buffer] = sb;
-
 	data = read_buffers[buffer];
+
 	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
-		if (page) {
-			memcpy(data, kmap(page), PAGE_CACHE_SIZE);
-			kunmap(page);
-			page_cache_release(page);
-		} else
-			memset(data, 0, PAGE_CACHE_SIZE);
+		if (blocknr + i < devsize) {
+			page = cramfs_read_getpage(mapping, blocknr + i);
+			if (page) {
+				memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE);
+				kunmap_atomic(page, KM_USER0);
+				cramfs_read_putpage(page);
+			} else
+				memset(data, 0, PAGE_CACHE_SIZE);
+		}
 		data += PAGE_CACHE_SIZE;
 	}
 	return read_buffers[buffer] + offset;
@@ -501,8 +504,19 @@ static int cramfs_readpage(struct file *
 	return 0;
 }
 
+static int cramfs_releasepage(struct page *page, gfp_t mask)
+{
+	int ret = -ENOENT;
+	if (page)
+		ret = !PageChecked(page);
+	printk("cramfs_releasepage called for page %p checked %d\n", page, ret);
+	WARN_ON(1);
+	return ret;
+}
+
 static struct address_space_operations cramfs_aops = {
-	.readpage = cramfs_readpage
+	.readpage = cramfs_readpage,
+	.releasepage = cramfs_releasepage,
 };
 
 /*

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-02 19:14                   ` Olaf Hering
@ 2006-06-02 19:41                     ` Andrew Morton
  2006-06-02 21:06                       ` Olaf Hering
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2006-06-02 19:41 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, viro

On Fri, 2 Jun 2006 21:14:30 +0200
Olaf Hering <olh@suse.de> wrote:

>  On Fri, Jun 02, Andrew Morton wrote:
> 
> > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked
> > page and implement a_ops.releasepage(), which will fail if PageChecked(),
> > and will succeed otherwise:
> > 
> > /*
> >  * cramfs_releasepage() will fail if cramfs_read() set PG_checked.  This
> >  * is so that invalidate_inode_pages() cannot zap the page while
> >  * cramfs_read() is trying to get at its contents.
> >  */
> > cramfs_releasepage(...)
> > {
> > 	if (PageChecked(page))
> > 		return 0;
> > 	return 1;
> > }
> 
> This function is not triggered in my testing.
> 
> > cramfs_read(...)
> > {
> > 	lock_page(page);
> > 	SetPagePrivate(page);
> > 	SetPageChecked(page);
> > 	read_mapping_page(...);
> > 	lock_page(page);
> > 	if (page->mapping == NULL) {
> > 		/* truncate got there first */
> > 		unlock_page(page);
> > 		bale();
> > 	}
> > 	memcpy();
> > 	ClearPageChecked(page);
> > 	ClearPagePrivate(page);
> > 	unlock_page(page);
> > }
> 
> recursive recursion? Where is page supposed to come from? Did you mean
> to call all this with a page returned from read_mapping_page(), and
> call read_mapping_page() twice?

That was all very-pseudo code.

> My version seems to leak memory somehow, Inactive and slab buffer_head
> keeps growing. I guess something is missing in the picture.
> 
> Doesnt read_cache_page lock the page already?
> read_cache_page
>   __read_cache_page
>     add_to_page_cache_lru
>       add_to_page_cache
>         SetPageLocked

read_cache_page() returns with the page locked if there's IO in flight
against it.

>  
> +static void cramfs_read_putpage(struct page *page)
> +{
> +	ClearPageChecked(page);
> +	ClearPagePrivate(page);
> +	unlock_page(page);
> +	page_cache_release(page);
> +}
> +
> +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */
> +static struct page *cramfs_read_getpage(struct address_space *m, unsigned int n)
> +{
> +	struct page *page;
> +	int readagain = 5;
> +retry:
> +	page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL);
> +	if (IS_ERR(page))
> +		return NULL;
> +	lock_page(page);
> +	SetPagePrivate(page);
> +	SetPageChecked(page);
> +	if (PageUptodate(page))
> +		return page;
> +	cramfs_read_putpage(page);
> +	printk("cramfs_read_getpage %d %p\n", 5 - readagain + 1, page);
> +	if (readagain--)
> +		goto retry;
> +	return NULL;
> +}

Oh, OK, we cannot use read_cache_page().  Because the above is still racy
(and still needs the retry loop).  We need to set PG_checked before
launching the read.  Something like:

/*
 * Return a locked, uptodate, !PagePrivate, !PageChecked page which needs a
 * single put_page by the caller.
 */
cramfs_read_getpage()
{
	page = find_lock_page()
	if (PageUptodate(page))
		return page;
	SetPagePrivate()
	SetPageChecked()
	err = m->a_ops->readpage(page);
	if (err) {
		ClearPageChecked(page);
		ClearPagePrivate(page);
		put_page(page);
		return NULL;
	}
	lock_page(page);
	ClearPageChecked(page);
	ClearPagePrivate(page);
	if (page->mapping == NULL) {
		/* truncate got there first */
		unlock_page(page);
		put_page(page);
		return NULL;
	}
	if (PageError(page)) {
		/* IO error */
		unlock_page(page);
		put_page(page);
		return NULL;
	}
	return page;			/* It's locked */
	

You'll see that the timing window here for invalidate_inode_pages() is very
small - invalidate has to come in at the exact right time after the page has
come unlocked so that its trylock wins the race against this function.

So I'd suggest that to get the cramfs_releasepage() code path tested you'd
need to do:

	wait_on_page_locked();
	msleep(1);
	lock_page()

above, to give invalidate a wider window to hit.

> +		if (blocknr + i < devsize) {
> +			page = cramfs_read_getpage(mapping, blocknr + i);
> +			if (page) {
> +				memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE);
> +				kunmap_atomic(page, KM_USER0);

kunmap_atomic() takes the virtual address, not a page*

	void *kaddr = kmap_atomic(...);
	memcpy(..., kaddr, ...);
	kunmap_atomic(kaddr);

As to why it's leaking: dunno, sorry.  It looks OK.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-02 19:41                     ` Andrew Morton
@ 2006-06-02 21:06                       ` Olaf Hering
  0 siblings, 0 replies; 23+ messages in thread
From: Olaf Hering @ 2006-06-02 21:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro

 On Fri, Jun 02, Andrew Morton wrote:

> Oh, OK, we cannot use read_cache_page().  Because the above is still racy
> (and still needs the retry loop).  We need to set PG_checked before
> launching the read.  Something like:
> 
> /*
>  * Return a locked, uptodate, !PagePrivate, !PageChecked page which needs a
>  * single put_page by the caller.
>  */
> cramfs_read_getpage()
> {
> 	page = find_lock_page()
> 	if (PageUptodate(page))
> 		return page;
> 	SetPagePrivate()
> 	SetPageChecked()
> 	err = m->a_ops->readpage(page);

I guess we have to put a find_or_create_page in there, and fill it.
Will try that tomorrow.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-02  9:11                 ` Andrew Morton
  2006-06-02 19:14                   ` Olaf Hering
@ 2006-06-02 19:37                   ` Olaf Hering
  2006-06-02 19:46                     ` Andrew Morton
  1 sibling, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-06-02 19:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro

 On Fri, Jun 02, Andrew Morton wrote:

> On Fri, 2 Jun 2006 10:43:27 +0200
> Olaf Hering <olh@suse.de> wrote:
> 
> >  On Thu, Jun 01, Andrew Morton wrote:
> > 
> > 
> > > > Do you want it like that?
> > > > 
> > > > lock_page(page);
> > > > if (PageUptodate(page)) {
> > > >         SetPageDirty(page);
> > > >         mb();
> > > >         return page;
> > > > }
> > > 
> > > Not really ;)  It's hacky.  It'd be better to take a lock.
> > 
> > Which lock exactly?
> 
> Ah, sorry, there isn't such a lock.  I was just carrying on.
> 
> > I'm not sure how to proceed from here.
> 
> I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked
> page and implement a_ops.releasepage(), which will fail if PageChecked(),
> and will succeed otherwise:

No leak without tweaking PG_private.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-02 19:37                   ` Olaf Hering
@ 2006-06-02 19:46                     ` Andrew Morton
  2006-06-03 13:13                       ` Olaf Hering
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2006-06-02 19:46 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, viro

On Fri, 2 Jun 2006 21:37:02 +0200
Olaf Hering <olh@suse.de> wrote:

> > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked
> > page and implement a_ops.releasepage(), which will fail if PageChecked(),
> > and will succeed otherwise:
> 
> No leak without tweaking PG_private.

Odd.  That would imply that PG_private is being left set somehow (it will
make the page unreclaimable).  But I don't see it.

Plus if we have lots of PagePrivate() pages floating about you should see
your ->releasepage() being called.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-02 19:46                     ` Andrew Morton
@ 2006-06-03 13:13                       ` Olaf Hering
  0 siblings, 0 replies; 23+ messages in thread
From: Olaf Hering @ 2006-06-03 13:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro

 On Fri, Jun 02, Andrew Morton wrote:

> On Fri, 2 Jun 2006 21:37:02 +0200
> Olaf Hering <olh@suse.de> wrote:
> 
> > > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked
> > > page and implement a_ops.releasepage(), which will fail if PageChecked(),
> > > and will succeed otherwise:
> > 
> > No leak without tweaking PG_private.
> 
> Odd.  That would imply that PG_private is being left set somehow (it will
> make the page unreclaimable).  But I don't see it.
> 
> Plus if we have lots of PagePrivate() pages floating about you should see
> your ->releasepage() being called.

The change from Chris leaks as well, but very very slowly.

Please wait while I dig into the page states...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 18:49   ` [PATCH] " Olaf Hering
  2006-06-01 19:12     ` Andrew Morton
@ 2006-06-01 20:17     ` Chris Mason
  2006-06-01 20:20       ` Olaf Hering
  2006-09-20 13:20     ` Olaf Hering
  2 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2006-06-01 20:17 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Morton, linux-kernel, Al Viro

On Thursday 01 June 2006 14:49, Olaf Hering wrote:
> This script will cause cramfs decompression errors, on SMP at least:
>
> #!/bin/bash
> while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
> while :;do ps faxs  </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do dmesg    </dev/null &>/dev/null&done </dev/null &>/dev/null&
> while :;do find /mounts/instsys -type f -print0|xargs -0 cat
> &>/dev/null;done

It looks as though:

* cramfs_readpage is synchronous
* cramfs_readpage always returns an up to date page
* cramfs data doesn't change
* read_cache_page as called by cramfs will always return a page that was 
up to date at one time, or an error.

I think this will work (but have not tested it).  Another option is to create 
a read_cache_page that pins the page via a page flag 
that invalidate_mapping_pages will honor.

-chris

diff -r a1a07af2d0cd fs/cramfs/inode.c
--- a/fs/cramfs/inode.c	Thu Jun 01 10:45:04 2006 -0400
+++ b/fs/cramfs/inode.c	Thu Jun 01 15:04:39 2006 -0400
@@ -190,18 +190,6 @@ static void *cramfs_read(struct super_bl
 		pages[i] = page;
 	}
 
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
-		if (page) {
-			wait_on_page_locked(page);
-			if (!PageUptodate(page)) {
-				/* asynchronous error */
-				page_cache_release(page);
-				pages[i] = NULL;
-			}
-		}
-	}
-
 	buffer = next_buffer;
 	next_buffer = NEXT_BUFFER(buffer);
 	buffer_blocknr[buffer] = blocknr;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 20:17     ` Chris Mason
@ 2006-06-01 20:20       ` Olaf Hering
  2006-06-01 20:29         ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-06-01 20:20 UTC (permalink / raw)
  To: Chris Mason; +Cc: Andrew Morton, linux-kernel, Al Viro

 On Thu, Jun 01, Chris Mason wrote:

> I think this will work (but have not tested it).  Another option is to create 
> a read_cache_page that pins the page via a page flag 
> that invalidate_mapping_pages will honor.

PageLocked or PageDirty, the latter only with a mb().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 20:20       ` Olaf Hering
@ 2006-06-01 20:29         ` Chris Mason
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Mason @ 2006-06-01 20:29 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Morton, linux-kernel, Al Viro

On Thursday 01 June 2006 16:20, Olaf Hering wrote:
>  On Thu, Jun 01, Chris Mason wrote:
> > I think this will work (but have not tested it).  Another option is to
> > create a read_cache_page that pins the page via a page flag
> > that invalidate_mapping_pages will honor.
>
> PageLocked or PageDirty, the latter only with a mb().

The problem is we need the bit to be set before we set the page up to date.  A 
locked page will never make it through the readpage() mechanisms and a dirty 
page that isn't up to date is not quite legal.

For cramfs, dirty would work I suppose.

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-06-01 18:49   ` [PATCH] " Olaf Hering
  2006-06-01 19:12     ` Andrew Morton
  2006-06-01 20:17     ` Chris Mason
@ 2006-09-20 13:20     ` Olaf Hering
  2006-09-20 18:47       ` Andrew Morton
  2 siblings, 1 reply; 23+ messages in thread
From: Olaf Hering @ 2006-09-20 13:20 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, Al Viro

On Thu, Jun 01, Olaf Hering wrote:

> ...
> Error -3 while decompressing!
> c0000000009592a2(2649)->c0000000edf87000(4096)
> Error -3 while decompressing!
> c000000000959298(2520)->c0000000edbc7000(4096)
> Error -3 while decompressing!
> c000000000959c70(2489)->c0000000f1482000(4096) 
> Error -3 while decompressing!
> c00000000095a629(2355)->c0000000edaff000(4096)
> Error -3 while decompressing!
> ...

Today I looked at this bug again and found that 2.6.18-rc6-git2 has
fix for this. Is the patch below supposed to fix the cramfs corruption
or does it just paper over the bug?

...
cramfs_read() clears parts of the src buffer because the page is not
uptodate. invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set
ClearPageUptodate() after cramfs_read() got the page from read_cache_page()
...

/root/cramfscrash.sh
#!/bin/bash
# cd /dev/shm/
# tar xfz /mounts/mirror/kernel/v2.6/linux-2.6.18.tar.gz
# cd linux-2.6.18/
# mkfs.cramfs drivers /tmp/cramfs.image
mount -vnt proc proc /proc
echo 1 > /proc/sys/kernel/panic
echo 9 > /proc/sysrq-trigger
mount -vnt sysfs sysfs /sys
modprobe -v loop
mount -vnt cramfs -o loop /tmp/cramfs.image /mnt
while :;do /sbin/blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
while :;do /usr/bin/find /mnt -type f -print0|xargs -0 cat &>/dev/null;done


kernel cmdline
xmon=off panic=1 sysrq=1 quiet root=/dev/disk/by-uuid/d50e4029-2e91-4332-bb16-24f946a74d3f ro init=/root/cramfscrash.sh



 016eb4a0ed06a3677d67a584da901f0e9a63c666.patch
From: Andrew Morton <akpm@osdl.org>

If a CPU faults this page into pagetables after invalidate_mapping_pages()
checked page_mapped(), invalidate_complete_page() will still proceed to remove
the page from pagecache.  This leaves the page-faulting process with a
detached page.  If it was MAP_SHARED then file data loss will ensue.

Fix that up by checking the page's refcount after taking tree_lock.

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 mm/truncate.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff -puN mm/truncate.c~invalidate_complete_page-race-fix mm/truncate.c
--- a/mm/truncate.c~invalidate_complete_page-race-fix
+++ a/mm/truncate.c
@@ -68,10 +68,10 @@ invalidate_complete_page(struct address_
 		return 0;
 
 	write_lock_irq(&mapping->tree_lock);
-	if (PageDirty(page)) {
-		write_unlock_irq(&mapping->tree_lock);
-		return 0;
-	}
+	if (PageDirty(page))
+		goto failed;
+	if (page_count(page) != 2)	/* caller's ref + pagecache ref */
+		goto failed;
 
 	BUG_ON(PagePrivate(page));
 	__remove_from_page_cache(page);
@@ -79,6 +79,9 @@ invalidate_complete_page(struct address_
 	ClearPageUptodate(page);
 	page_cache_release(page);	/* pagecache ref */
 	return 1;
+failed:
+	write_unlock_irq(&mapping->tree_lock);
+	return 0;
 }
 
 /**
_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device
  2006-09-20 13:20     ` Olaf Hering
@ 2006-09-20 18:47       ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2006-09-20 18:47 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, Al Viro

On Wed, 20 Sep 2006 15:20:11 +0200
Olaf Hering <olh@suse.de> wrote:

> On Thu, Jun 01, Olaf Hering wrote:
> 
> > ...
> > Error -3 while decompressing!
> > c0000000009592a2(2649)->c0000000edf87000(4096)
> > Error -3 while decompressing!
> > c000000000959298(2520)->c0000000edbc7000(4096)
> > Error -3 while decompressing!
> > c000000000959c70(2489)->c0000000f1482000(4096) 
> > Error -3 while decompressing!
> > c00000000095a629(2355)->c0000000edaff000(4096)
> > Error -3 while decompressing!
> > ...
> 
> Today I looked at this bug again and found that 2.6.18-rc6-git2 has
> fix for this. Is the patch below supposed to fix the cramfs corruption
> or does it just paper over the bug?
> 
> ...
> cramfs_read() clears parts of the src buffer because the page is not
> uptodate. invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set
> ClearPageUptodate() after cramfs_read() got the page from read_cache_page()
> ...
> 
> /root/cramfscrash.sh
> #!/bin/bash
> # cd /dev/shm/
> # tar xfz /mounts/mirror/kernel/v2.6/linux-2.6.18.tar.gz
> # cd linux-2.6.18/
> # mkfs.cramfs drivers /tmp/cramfs.image
> mount -vnt proc proc /proc
> echo 1 > /proc/sys/kernel/panic
> echo 9 > /proc/sysrq-trigger

That's a novel way of setting the log level.

> mount -vnt sysfs sysfs /sys
> modprobe -v loop
> mount -vnt cramfs -o loop /tmp/cramfs.image /mnt
> while :;do /sbin/blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
> while :;do /usr/bin/find /mnt -type f -print0|xargs -0 cat &>/dev/null;done
> 
> 
> kernel cmdline
> xmon=off panic=1 sysrq=1 quiet root=/dev/disk/by-uuid/d50e4029-2e91-4332-bb16-24f946a74d3f ro init=/root/cramfscrash.sh
> 
> 
> 
>  016eb4a0ed06a3677d67a584da901f0e9a63c666.patch
> From: Andrew Morton <akpm@osdl.org>
> 
> If a CPU faults this page into pagetables after invalidate_mapping_pages()
> checked page_mapped(), invalidate_complete_page() will still proceed to remove
> the page from pagecache.  This leaves the page-faulting process with a
> detached page.  If it was MAP_SHARED then file data loss will ensue.
> 
> Fix that up by checking the page's refcount after taking tree_lock.

Yes, I think this is a reasonable solution to the cramfs race.  Previously,
invalidate_inode_pages() was removing pages from pagecache even if some
other thread had a ref against them, which is pretty bad behaviour.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-09-20 18:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-29 21:40 cramfs corruption after BLKFLSBUF on loop device Olaf Hering
2006-05-30 13:19 ` Olaf Hering
2006-05-30 18:24 ` Olaf Hering
2006-06-01 18:49   ` [PATCH] " Olaf Hering
2006-06-01 19:12     ` Andrew Morton
2006-06-01 19:15       ` Andrew Morton
2006-06-01 20:10       ` Olaf Hering
2006-06-01 21:24         ` Andrew Morton
2006-06-01 21:41           ` Olaf Hering
2006-06-01 21:57             ` Andrew Morton
2006-06-02  8:43               ` Olaf Hering
2006-06-02  9:11                 ` Andrew Morton
2006-06-02 19:14                   ` Olaf Hering
2006-06-02 19:41                     ` Andrew Morton
2006-06-02 21:06                       ` Olaf Hering
2006-06-02 19:37                   ` Olaf Hering
2006-06-02 19:46                     ` Andrew Morton
2006-06-03 13:13                       ` Olaf Hering
2006-06-01 20:17     ` Chris Mason
2006-06-01 20:20       ` Olaf Hering
2006-06-01 20:29         ` Chris Mason
2006-09-20 13:20     ` Olaf Hering
2006-09-20 18:47       ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox