* cramfs corruption after BLKFLSBUF on loop device
@ 2006-05-29 21:40 Olaf Hering
2006-05-30 13:19 ` Olaf Hering
2006-05-30 18:24 ` Olaf Hering
0 siblings, 2 replies; 23+ messages in thread
From: Olaf Hering @ 2006-05-29 21:40 UTC (permalink / raw)
To: linux-kernel
This script will cause cramfs decompression errors, on SMP at least:
#!/bin/bash
while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null&
while :;do ps faxs </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do dmesg </dev/null &>/dev/null&done </dev/null &>/dev/null&
while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done
...
Error -3 while decompressing!
c0000000009592a2(2649)->c0000000edf87000(4096)
Error -3 while decompressing!
c000000000959298(2520)->c0000000edbc7000(4096)
Error -3 while decompressing!
c000000000959c70(2489)->c0000000f1482000(4096)
Error -3 while decompressing!
c00000000095a629(2355)->c0000000edaff000(4096)
Error -3 while decompressing!
...
evms_access does the ioctl (lots of them) on the loop device.
Its a long standing bug, 2.6.5 fails as well. cramfs_read() clears parts
of the src buffer because the page is not uptodate. invalidate_bdev()
touched the page last.
cramfs_read() was called from line 480 or 490 when the
PageUptodate(page) test fails.
...
464 static int cramfs_readpage(struct file *file, struct page * page)
..
479 if (page->index)
480 start_offset = *(u32 *) cramfs_read(sb, blkptr_offset-4, 4);
..
488 bytes_filled = cramfs_uncompress_block(pgdata,
489 PAGE_CACHE_SIZE,
490 cramfs_read(sb, start_offset, compr_len),
491 compr_len);
...
There are rumors that cramfs is not smp safe...
Maybe the only hope is to tell evms to not do that ioctl for loop.
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: cramfs corruption after BLKFLSBUF on loop device 2006-05-29 21:40 cramfs corruption after BLKFLSBUF on loop device Olaf Hering @ 2006-05-30 13:19 ` Olaf Hering 2006-05-30 18:24 ` Olaf Hering 1 sibling, 0 replies; 23+ messages in thread From: Olaf Hering @ 2006-05-30 13:19 UTC (permalink / raw) To: linux-kernel On Mon, May 29, Olaf Hering wrote: > This script will cause cramfs decompression errors, on SMP at least: > > #!/bin/bash > while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& > while :;do ps faxs </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do dmesg </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done > > ... > Error -3 while decompressing! > c0000000009592a2(2649)->c0000000edf87000(4096) > Error -3 while decompressing! > c000000000959298(2520)->c0000000edbc7000(4096) > Error -3 while decompressing! > c000000000959c70(2489)->c0000000f1482000(4096) > Error -3 while decompressing! > c00000000095a629(2355)->c0000000edaff000(4096) > Error -3 while decompressing! > ... This change works for me, the added BUG() does not trigger. read_cache_page() returns the page in PageUptodate() state. But a few ticks later, invalidate_complete_page() calls ClearPageUptodate(), on another cpu. The SetPageDirty() works for my testcase, but not without the mb(). Does anyone know what sideeffects the SetPageDirty() has for the loopmounted cramfs? --- fs/cramfs/inode.c | 2 ++ fs/cramfs/uncompress.c | 1 + 2 files changed, 3 insertions(+) Index: linux-2.6.16.16-1.6/fs/cramfs/inode.c =================================================================== --- linux-2.6.16.16-1.6.orig/fs/cramfs/inode.c +++ linux-2.6.16.16-1.6/fs/cramfs/inode.c @@ -186,6 +186,8 @@ static void *cramfs_read(struct super_bl /* synchronous error? */ if (IS_ERR(page)) page = NULL; + SetPageDirty(page); + mb(); } pages[i] = page; } Index: linux-2.6.16.16-1.6/fs/cramfs/uncompress.c =================================================================== --- linux-2.6.16.16-1.6.orig/fs/cramfs/uncompress.c +++ linux-2.6.16.16-1.6/fs/cramfs/uncompress.c @@ -50,6 +50,7 @@ int cramfs_uncompress_block(void *dst, i err: printk("Error %d while decompressing!\n", err); printk("%p(%d)->%p(%d)\n", src, srclen, dst, dstlen); + BUG_ON(1); return 0; } ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cramfs corruption after BLKFLSBUF on loop device 2006-05-29 21:40 cramfs corruption after BLKFLSBUF on loop device Olaf Hering 2006-05-30 13:19 ` Olaf Hering @ 2006-05-30 18:24 ` Olaf Hering 2006-06-01 18:49 ` [PATCH] " Olaf Hering 1 sibling, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-05-30 18:24 UTC (permalink / raw) To: linux-kernel, Al Viro On Mon, May 29, Olaf Hering wrote: > This script will cause cramfs decompression errors, on SMP at least: > > #!/bin/bash > while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& > while :;do ps faxs </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do dmesg </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done > > ... > Error -3 while decompressing! > c0000000009592a2(2649)->c0000000edf87000(4096) > Error -3 while decompressing! > c000000000959298(2520)->c0000000edbc7000(4096) > Error -3 while decompressing! > c000000000959c70(2489)->c0000000f1482000(4096) > Error -3 while decompressing! > c00000000095a629(2355)->c0000000edaff000(4096) > Error -3 while decompressing! > ... > > evms_access does the ioctl (lots of them) on the loop device. > Its a long standing bug, 2.6.5 fails as well. cramfs_read() clears parts > of the src buffer because the page is not uptodate. invalidate_bdev() > touched the page last. > cramfs_read() was called from line 480 or 490 when the > PageUptodate(page) test fails. Al, you added the PageUptodate check for 2.6.2. http://linux.bkbits.net:8080/linux-2.6/gnupatch@400c1cddyzRoKomOj57xxUAmKnMbZQ Should there be some locking for blockdev --flushbufs, or is the check just bogus? ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-05-30 18:24 ` Olaf Hering @ 2006-06-01 18:49 ` Olaf Hering 2006-06-01 19:12 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Olaf Hering @ 2006-06-01 18:49 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Al Viro This script will cause cramfs decompression errors, on SMP at least: #!/bin/bash while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& while :;do ps faxs </dev/null &>/dev/null&done </dev/null &>/dev/null& while :;do dmesg </dev/null &>/dev/null&done </dev/null &>/dev/null& while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done (The used executables come from the symlinked /mounts/instsys directory) ... Error -3 while decompressing! c0000000009592a2(2649)->c0000000edf87000(4096) Error -3 while decompressing! c000000000959298(2520)->c0000000edbc7000(4096) Error -3 while decompressing! c000000000959c70(2489)->c0000000f1482000(4096) Error -3 while decompressing! c00000000095a629(2355)->c0000000edaff000(4096) Error -3 while decompressing! ... Its a long standing bug, introduced in 2.6.2. cramfs_read() clears parts of the src buffer because the page is not uptodate. invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set ClearPageUptodate() after cramfs_read() got the page from read_cache_page() If PageUptodate() fails, read the page again before using it. There is still a small window were the page may not be uptodate before copying its contents away. evms_access does the BLKFLSBUF ioctl (lots of them) on the loop device. This will corrupt the SuSE installation image on SMP kernels, leading to random segfaults. Signed-off-by: Olaf Hering <olh@suse.de> --- fs/cramfs/inode.c | 70 ++++++++++++++++++++++++------------------------------ 1 file changed, 32 insertions(+), 38 deletions(-) Index: linux-2.6.16.16-1.6/fs/cramfs/inode.c =================================================================== --- linux-2.6.16.16-1.6.orig/fs/cramfs/inode.c +++ linux-2.6.16.16-1.6/fs/cramfs/inode.c @@ -147,8 +147,8 @@ static int next_buffer; static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len) { struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping; - struct page *pages[BLKS_PER_BUF]; - unsigned i, blocknr, buffer, unread; + struct page *page; + unsigned i, blocknr, buffer, readagain; unsigned long devsize; char *data; @@ -174,48 +174,42 @@ static void *cramfs_read(struct super_bl devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT; - /* Ok, read in BLKS_PER_BUF pages completely first. */ - unread = 0; - for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = NULL; - - if (blocknr + i < devsize) { - page = read_cache_page(mapping, blocknr + i, - (filler_t *)mapping->a_ops->readpage, - NULL); - /* synchronous error? */ - if (IS_ERR(page)) - page = NULL; - } - pages[i] = page; - } - - for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = pages[i]; - if (page) { - wait_on_page_locked(page); - if (!PageUptodate(page)) { - /* asynchronous error */ - page_cache_release(page); - pages[i] = NULL; - } - } - } - buffer = next_buffer; next_buffer = NEXT_BUFFER(buffer); buffer_blocknr[buffer] = blocknr; buffer_dev[buffer] = sb; - data = read_buffers[buffer]; + for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = pages[i]; - if (page) { - memcpy(data, kmap(page), PAGE_CACHE_SIZE); - kunmap(page); - page_cache_release(page); - } else - memset(data, 0, PAGE_CACHE_SIZE); + if (blocknr + i < devsize) { + page = NULL; + for (readagain = 0; readagain < 5; readagain++) { + page = read_cache_page(mapping, blocknr + i, + (filler_t *)mapping->a_ops->readpage, + NULL); + /* synchronous error? */ + if (IS_ERR(page)) { + page = NULL; + break; + } + wait_on_page_locked(page); + if (PageUptodate(page)) + break; + /* asynchronous error */ + /* maybe BLKFLSBUF flushed the page */ + page_cache_release(page); + page = NULL; + } + if (page) { + memcpy(data, kmap(page), PAGE_CACHE_SIZE); + kunmap(page); + page_cache_release(page); + } else + memset(data, 0, PAGE_CACHE_SIZE); + if (readagain) + printk(KERN_DEBUG "cramfs_read got %s Uptodate page after %d attempt(s)\n", + page ? "an" : "no", readagain); + } data += PAGE_CACHE_SIZE; } return read_buffers[buffer] + offset; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 18:49 ` [PATCH] " Olaf Hering @ 2006-06-01 19:12 ` Andrew Morton 2006-06-01 19:15 ` Andrew Morton 2006-06-01 20:10 ` Olaf Hering 2006-06-01 20:17 ` Chris Mason 2006-09-20 13:20 ` Olaf Hering 2 siblings, 2 replies; 23+ messages in thread From: Andrew Morton @ 2006-06-01 19:12 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel, viro On Thu, 1 Jun 2006 20:49:38 +0200 Olaf Hering <olh@suse.de> wrote: > > This script will cause cramfs decompression errors, on SMP at least: > > #!/bin/bash > while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& > while :;do ps faxs </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do dmesg </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done > > (The used executables come from the symlinked /mounts/instsys directory) > > ... > Error -3 while decompressing! > c0000000009592a2(2649)->c0000000edf87000(4096) > Error -3 while decompressing! > c000000000959298(2520)->c0000000edbc7000(4096) > Error -3 while decompressing! > c000000000959c70(2489)->c0000000f1482000(4096) > Error -3 while decompressing! > c00000000095a629(2355)->c0000000edaff000(4096) > Error -3 while decompressing! > ... > > Its a long standing bug, introduced in 2.6.2. > > cramfs_read() clears parts of the src buffer because the page is not uptodate. > invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set ClearPageUptodate() > after cramfs_read() got the page from read_cache_page() > If PageUptodate() fails, read the page again before using it. > There is still a small window were the page may not be uptodate before copying > its contents away. > > evms_access does the BLKFLSBUF ioctl (lots of them) on the loop device. This will > corrupt the SuSE installation image on SMP kernels, leading to random segfaults. > OK, invalidate_inode_pages(). > + > for (i = 0; i < BLKS_PER_BUF; i++) { > - struct page *page = pages[i]; > - if (page) { > - memcpy(data, kmap(page), PAGE_CACHE_SIZE); > - kunmap(page); > - page_cache_release(page); > - } else > - memset(data, 0, PAGE_CACHE_SIZE); > + if (blocknr + i < devsize) { > + page = NULL; > + for (readagain = 0; readagain < 5; readagain++) { > + page = read_cache_page(mapping, blocknr + i, > + (filler_t *)mapping->a_ops->readpage, > + NULL); > + /* synchronous error? */ > + if (IS_ERR(page)) { > + page = NULL; > + break; > + } > + wait_on_page_locked(page); > + if (PageUptodate(page)) > + break; > + /* asynchronous error */ > + /* maybe BLKFLSBUF flushed the page */ > + page_cache_release(page); > + page = NULL; > + } > + if (page) { > + memcpy(data, kmap(page), PAGE_CACHE_SIZE); > + kunmap(page); > + page_cache_release(page); > + } else > + memset(data, 0, PAGE_CACHE_SIZE); > + if (readagain) > + printk(KERN_DEBUG "cramfs_read got %s Uptodate page after %d attempt(s)\n", > + page ? "an" : "no", readagain); > + } This code absolutely needs a comment telling the poor reader what it's in there for. It's still racy. Do the memcpy while the page is locked: retry: read_cache_page() lock_page() if (!PageUptodate()) { if (readagain-- != 0) goto retry; give_up(); } memcpy(); unlock_page(); Also, let's use kmap_atomic() while we're in there. Of course, this will all just fail less often than it presently does. We'd be better off taking a lock if poss to keep the ioctl away. I'd have thought that it'd be appropriate to take i_mutex while running invalidate_inode_pages(). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 19:12 ` Andrew Morton @ 2006-06-01 19:15 ` Andrew Morton 2006-06-01 20:10 ` Olaf Hering 1 sibling, 0 replies; 23+ messages in thread From: Andrew Morton @ 2006-06-01 19:15 UTC (permalink / raw) To: olh, linux-kernel, viro On Thu, 1 Jun 2006 12:12:00 -0700 Andrew Morton <akpm@osdl.org> wrote: > I'd have > thought that it'd be appropriate to take i_mutex while running > invalidate_inode_pages(). Let me add: it'd be appropriate, but also difficult - invalidate_inode_pages() is called from all sorts of contexts, including inside spinlock. A quickie fix would be to take i_mutex at the top level, on entry to the ioctl. But that'll only fix the BLKFLSBUF case. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 19:12 ` Andrew Morton 2006-06-01 19:15 ` Andrew Morton @ 2006-06-01 20:10 ` Olaf Hering 2006-06-01 21:24 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-06-01 20:10 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, viro This script will cause cramfs decompression errors, on SMP at least: #!/bin/bash while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& while :;do ps faxs </dev/null &>/dev/null&done </dev/null &>/dev/null& while :;do dmesg </dev/null &>/dev/null&done </dev/null &>/dev/null& while :;do find /mounts/instsys -type f -print0|xargs -0 cat &>/dev/null;done (The used executables come from the symlinked /mounts/instsys directory) ... Error -3 while decompressing! c0000000009592a2(2649)->c0000000edf87000(4096) Error -3 while decompressing! c000000000959298(2520)->c0000000edbc7000(4096) Error -3 while decompressing! c000000000959c70(2489)->c0000000f1482000(4096) Error -3 while decompressing! c00000000095a629(2355)->c0000000edaff000(4096) Error -3 while decompressing! ... Its a long standing bug, introduced in 2.6.2. cramfs_read() clears parts of the src buffer because the page is not uptodate. invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set ClearPageUptodate() after cramfs_read() got the page from read_cache_page() If PageUptodate() fails, read the page again before using it. evms_access does the BLKFLSBUF ioctl (lots of them) on the loop device. This will corrupt the SuSE installation image on SMP kernels, leading to random segfaults. Signed-off-by: Olaf Hering <olh@suse.de> --- fs/cramfs/inode.c | 70 ++++++++++++++++++++++++------------------------------ 1 file changed, 32 insertions(+), 38 deletions(-) Index: linux-2.6.16.16-1.6/fs/cramfs/inode.c =================================================================== --- linux-2.6.16.16-1.6.orig/fs/cramfs/inode.c +++ linux-2.6.16.16-1.6/fs/cramfs/inode.c @@ -140,6 +140,25 @@ static unsigned buffer_blocknr[READ_BUFF static struct super_block * buffer_dev[READ_BUFFERS]; static int next_buffer; +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */ +static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n) +{ + struct page *page; + int readagain = 5; +retry: + page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL); + if (IS_ERR(page)) + return NULL; + lock_page(page); + if (PageUptodate(page)) + return page; + unlock_page(page); + page_cache_release(page); + if (readagain--) + goto retry; + return NULL; +} + /* * Returns a pointer to a buffer containing at least LEN bytes of * filesystem starting at byte offset OFFSET into the filesystem. @@ -147,8 +166,8 @@ static int next_buffer; static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len) { struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping; - struct page *pages[BLKS_PER_BUF]; - unsigned i, blocknr, buffer, unread; + struct page *page; + unsigned i, blocknr, buffer; unsigned long devsize; char *data; @@ -174,48 +193,23 @@ static void *cramfs_read(struct super_bl devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT; - /* Ok, read in BLKS_PER_BUF pages completely first. */ - unread = 0; - for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = NULL; - - if (blocknr + i < devsize) { - page = read_cache_page(mapping, blocknr + i, - (filler_t *)mapping->a_ops->readpage, - NULL); - /* synchronous error? */ - if (IS_ERR(page)) - page = NULL; - } - pages[i] = page; - } - - for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = pages[i]; - if (page) { - wait_on_page_locked(page); - if (!PageUptodate(page)) { - /* asynchronous error */ - page_cache_release(page); - pages[i] = NULL; - } - } - } - buffer = next_buffer; next_buffer = NEXT_BUFFER(buffer); buffer_blocknr[buffer] = blocknr; buffer_dev[buffer] = sb; - data = read_buffers[buffer]; + for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = pages[i]; - if (page) { - memcpy(data, kmap(page), PAGE_CACHE_SIZE); - kunmap(page); - page_cache_release(page); - } else - memset(data, 0, PAGE_CACHE_SIZE); + if (blocknr + i < devsize) { + page = cramfs_read_cache_page(mapping, blocknr + i); + if (page) { + memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE); + kunmap(page); + unlock_page(page); + page_cache_release(page); + } else + memset(data, 0, PAGE_CACHE_SIZE); + } data += PAGE_CACHE_SIZE; } return read_buffers[buffer] + offset; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 20:10 ` Olaf Hering @ 2006-06-01 21:24 ` Andrew Morton 2006-06-01 21:41 ` Olaf Hering 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2006-06-01 21:24 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel, viro Olaf Hering <olh@suse.de> wrote: > > > > +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */ > +static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n) > +{ > + struct page *page; > + int readagain = 5; > +retry: > + page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL); > + if (IS_ERR(page)) > + return NULL; > + lock_page(page); > + if (PageUptodate(page)) > + return page; > + unlock_page(page); > + page_cache_release(page); > + if (readagain--) > + goto retry; > + return NULL; > +} Better, but it's still awful, isn't it? The things you were discussing with Chris look more promising. PG_Dirty would be a bit of a hack, but at least it'd be a 100% reliable hack, whereas the above is a whatever-the-previous-failure-rate-was-to-the-fifth hack. > + page = cramfs_read_cache_page(mapping, blocknr + i); > + if (page) { > + memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE); > + kunmap(page); kunmap_atomic, please. > + unlock_page(page); > + page_cache_release(page); > + } else > + memset(data, 0, PAGE_CACHE_SIZE); > + } > data += PAGE_CACHE_SIZE; > } > return read_buffers[buffer] + offset; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 21:24 ` Andrew Morton @ 2006-06-01 21:41 ` Olaf Hering 2006-06-01 21:57 ` Andrew Morton 0 siblings, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-06-01 21:41 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, viro On Thu, Jun 01, Andrew Morton wrote: > Olaf Hering <olh@suse.de> wrote: > > > > > > > > +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */ > > +static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n) > > +{ > > + struct page *page; > > + int readagain = 5; > > +retry: > > + page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL); > > + if (IS_ERR(page)) > > + return NULL; > > + lock_page(page); > > + if (PageUptodate(page)) > > + return page; > > + unlock_page(page); > > + page_cache_release(page); > > + if (readagain--) > > + goto retry; > > + return NULL; > > +} > > Better, but it's still awful, isn't it? The things you were discussing > with Chris look more promising. PG_Dirty would be a bit of a hack, but at > least it'd be a 100% reliable hack, whereas the above is a > whatever-the-previous-failure-rate-was-to-the-fifth hack. Do you want it like that? lock_page(page); if (PageUptodate(page)) { SetPageDirty(page); mb(); return page; } and perhaps a ClearPageDirty() after memcpy. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 21:41 ` Olaf Hering @ 2006-06-01 21:57 ` Andrew Morton 2006-06-02 8:43 ` Olaf Hering 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2006-06-01 21:57 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel, viro Olaf Hering <olh@suse.de> wrote: > > On Thu, Jun 01, Andrew Morton wrote: > > > Olaf Hering <olh@suse.de> wrote: > > > > > > > > > > > > +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */ > > > +static struct page *cramfs_read_cache_page(struct address_space *m, unsigned int n) > > > +{ > > > + struct page *page; > > > + int readagain = 5; > > > +retry: > > > + page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL); > > > + if (IS_ERR(page)) > > > + return NULL; > > > + lock_page(page); > > > + if (PageUptodate(page)) > > > + return page; > > > + unlock_page(page); > > > + page_cache_release(page); > > > + if (readagain--) > > > + goto retry; > > > + return NULL; > > > +} > > > > Better, but it's still awful, isn't it? The things you were discussing > > with Chris look more promising. PG_Dirty would be a bit of a hack, but at > > least it'd be a 100% reliable hack, whereas the above is a > > whatever-the-previous-failure-rate-was-to-the-fifth hack. > > Do you want it like that? > > lock_page(page); > if (PageUptodate(page)) { > SetPageDirty(page); > mb(); > return page; > } Not really ;) It's hacky. It'd be better to take a lock. I expect it'd work though, as long as... > and perhaps a ClearPageDirty() after memcpy. I assume this is a read-only filesystem? I mean, if someone had really tried to dirty the page in the meanwhile via, say, munmap or msync which don't lock the page, we just lost their data. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 21:57 ` Andrew Morton @ 2006-06-02 8:43 ` Olaf Hering 2006-06-02 9:11 ` Andrew Morton 0 siblings, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-06-02 8:43 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, viro On Thu, Jun 01, Andrew Morton wrote: > > Do you want it like that? > > > > lock_page(page); > > if (PageUptodate(page)) { > > SetPageDirty(page); > > mb(); > > return page; > > } > > Not really ;) It's hacky. It'd be better to take a lock. Which lock exactly? I'm not sure how to proceed from here. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-02 8:43 ` Olaf Hering @ 2006-06-02 9:11 ` Andrew Morton 2006-06-02 19:14 ` Olaf Hering 2006-06-02 19:37 ` Olaf Hering 0 siblings, 2 replies; 23+ messages in thread From: Andrew Morton @ 2006-06-02 9:11 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel, viro On Fri, 2 Jun 2006 10:43:27 +0200 Olaf Hering <olh@suse.de> wrote: > On Thu, Jun 01, Andrew Morton wrote: > > > > > Do you want it like that? > > > > > > lock_page(page); > > > if (PageUptodate(page)) { > > > SetPageDirty(page); > > > mb(); > > > return page; > > > } > > > > Not really ;) It's hacky. It'd be better to take a lock. > > Which lock exactly? Ah, sorry, there isn't such a lock. I was just carrying on. > I'm not sure how to proceed from here. I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked page and implement a_ops.releasepage(), which will fail if PageChecked(), and will succeed otherwise: /* * cramfs_releasepage() will fail if cramfs_read() set PG_checked. This * is so that invalidate_inode_pages() cannot zap the page while * cramfs_read() is trying to get at its contents. */ cramfs_releasepage(...) { if (PageChecked(page)) return 0; return 1; } cramfs_read(...) { lock_page(page); SetPagePrivate(page); SetPageChecked(page); read_mapping_page(...); lock_page(page); if (page->mapping == NULL) { /* truncate got there first */ unlock_page(page); bale(); } memcpy(); ClearPageChecked(page); ClearPagePrivate(page); unlock_page(page); } PG_checked is a filesystem-private flag. It'll soon be renamed to PG_fs_misc. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-02 9:11 ` Andrew Morton @ 2006-06-02 19:14 ` Olaf Hering 2006-06-02 19:41 ` Andrew Morton 2006-06-02 19:37 ` Olaf Hering 1 sibling, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-06-02 19:14 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, viro On Fri, Jun 02, Andrew Morton wrote: > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked > page and implement a_ops.releasepage(), which will fail if PageChecked(), > and will succeed otherwise: > > /* > * cramfs_releasepage() will fail if cramfs_read() set PG_checked. This > * is so that invalidate_inode_pages() cannot zap the page while > * cramfs_read() is trying to get at its contents. > */ > cramfs_releasepage(...) > { > if (PageChecked(page)) > return 0; > return 1; > } This function is not triggered in my testing. > cramfs_read(...) > { > lock_page(page); > SetPagePrivate(page); > SetPageChecked(page); > read_mapping_page(...); > lock_page(page); > if (page->mapping == NULL) { > /* truncate got there first */ > unlock_page(page); > bale(); > } > memcpy(); > ClearPageChecked(page); > ClearPagePrivate(page); > unlock_page(page); > } recursive recursion? Where is page supposed to come from? Did you mean to call all this with a page returned from read_mapping_page(), and call read_mapping_page() twice? My version seems to leak memory somehow, Inactive and slab buffer_head keeps growing. I guess something is missing in the picture. Doesnt read_cache_page lock the page already? read_cache_page __read_cache_page add_to_page_cache_lru add_to_page_cache SetPageLocked --- /tmp/meminfo-1149269715 2006-06-02 13:35:15.439683381 -0400 +++ x 2006-06-02 13:53:20.379683381 -0400 @@ -1,23 +1,23 @@ MemTotal: 3952952 kB -MemFree: 3395708 kB +MemFree: 2966276 kB Buffers: 0 kB -Cached: 328960 kB +Cached: 326904 kB SwapCached: 0 kB -Active: 366884 kB -Inactive: 111496 kB +Active: 381720 kB +Inactive: 518188 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 3952952 kB -LowFree: 3395708 kB +LowFree: 2966276 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB -Mapped: 56320 kB -Slab: 60608 kB +Mapped: 60440 kB +Slab: 68704 kB CommitLimit: 1976476 kB -Committed_AS: 155304 kB -PageTables: 1216 kB +Committed_AS: 158780 kB +PageTables: 1112 kB VmallocTotal: 8589934592 kB VmallocUsed: 2452 kB VmallocChunk: 8589932068 kB diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 9efcc3a..1266b8e 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -141,6 +141,35 @@ static unsigned buffer_blocknr[READ_BUFF static struct super_block * buffer_dev[READ_BUFFERS]; static int next_buffer; +static void cramfs_read_putpage(struct page *page) +{ + ClearPageChecked(page); + ClearPagePrivate(page); + unlock_page(page); + page_cache_release(page); +} + +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */ +static struct page *cramfs_read_getpage(struct address_space *m, unsigned int n) +{ + struct page *page; + int readagain = 5; +retry: + page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL); + if (IS_ERR(page)) + return NULL; + lock_page(page); + SetPagePrivate(page); + SetPageChecked(page); + if (PageUptodate(page)) + return page; + cramfs_read_putpage(page); + printk("cramfs_read_getpage %d %p\n", 5 - readagain + 1, page); + if (readagain--) + goto retry; + return NULL; +} + /* * Returns a pointer to a buffer containing at least LEN bytes of * filesystem starting at byte offset OFFSET into the filesystem. @@ -148,8 +177,8 @@ static int next_buffer; static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len) { struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping; - struct page *pages[BLKS_PER_BUF]; - unsigned i, blocknr, buffer, unread; + struct page *page; + unsigned i, blocknr, buffer; unsigned long devsize; char *data; @@ -175,48 +204,22 @@ static void *cramfs_read(struct super_bl devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT; - /* Ok, read in BLKS_PER_BUF pages completely first. */ - unread = 0; - for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = NULL; - - if (blocknr + i < devsize) { - page = read_cache_page(mapping, blocknr + i, - (filler_t *)mapping->a_ops->readpage, - NULL); - /* synchronous error? */ - if (IS_ERR(page)) - page = NULL; - } - pages[i] = page; - } - - for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = pages[i]; - if (page) { - wait_on_page_locked(page); - if (!PageUptodate(page)) { - /* asynchronous error */ - page_cache_release(page); - pages[i] = NULL; - } - } - } - buffer = next_buffer; next_buffer = NEXT_BUFFER(buffer); buffer_blocknr[buffer] = blocknr; buffer_dev[buffer] = sb; - data = read_buffers[buffer]; + for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = pages[i]; - if (page) { - memcpy(data, kmap(page), PAGE_CACHE_SIZE); - kunmap(page); - page_cache_release(page); - } else - memset(data, 0, PAGE_CACHE_SIZE); + if (blocknr + i < devsize) { + page = cramfs_read_getpage(mapping, blocknr + i); + if (page) { + memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE); + kunmap_atomic(page, KM_USER0); + cramfs_read_putpage(page); + } else + memset(data, 0, PAGE_CACHE_SIZE); + } data += PAGE_CACHE_SIZE; } return read_buffers[buffer] + offset; @@ -501,8 +504,19 @@ static int cramfs_readpage(struct file * return 0; } +static int cramfs_releasepage(struct page *page, gfp_t mask) +{ + int ret = -ENOENT; + if (page) + ret = !PageChecked(page); + printk("cramfs_releasepage called for page %p checked %d\n", page, ret); + WARN_ON(1); + return ret; +} + static struct address_space_operations cramfs_aops = { - .readpage = cramfs_readpage + .readpage = cramfs_readpage, + .releasepage = cramfs_releasepage, }; /* ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-02 19:14 ` Olaf Hering @ 2006-06-02 19:41 ` Andrew Morton 2006-06-02 21:06 ` Olaf Hering 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2006-06-02 19:41 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel, viro On Fri, 2 Jun 2006 21:14:30 +0200 Olaf Hering <olh@suse.de> wrote: > On Fri, Jun 02, Andrew Morton wrote: > > > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked > > page and implement a_ops.releasepage(), which will fail if PageChecked(), > > and will succeed otherwise: > > > > /* > > * cramfs_releasepage() will fail if cramfs_read() set PG_checked. This > > * is so that invalidate_inode_pages() cannot zap the page while > > * cramfs_read() is trying to get at its contents. > > */ > > cramfs_releasepage(...) > > { > > if (PageChecked(page)) > > return 0; > > return 1; > > } > > This function is not triggered in my testing. > > > cramfs_read(...) > > { > > lock_page(page); > > SetPagePrivate(page); > > SetPageChecked(page); > > read_mapping_page(...); > > lock_page(page); > > if (page->mapping == NULL) { > > /* truncate got there first */ > > unlock_page(page); > > bale(); > > } > > memcpy(); > > ClearPageChecked(page); > > ClearPagePrivate(page); > > unlock_page(page); > > } > > recursive recursion? Where is page supposed to come from? Did you mean > to call all this with a page returned from read_mapping_page(), and > call read_mapping_page() twice? That was all very-pseudo code. > My version seems to leak memory somehow, Inactive and slab buffer_head > keeps growing. I guess something is missing in the picture. > > Doesnt read_cache_page lock the page already? > read_cache_page > __read_cache_page > add_to_page_cache_lru > add_to_page_cache > SetPageLocked read_cache_page() returns with the page locked if there's IO in flight against it. > > +static void cramfs_read_putpage(struct page *page) > +{ > + ClearPageChecked(page); > + ClearPagePrivate(page); > + unlock_page(page); > + page_cache_release(page); > +} > + > +/* return a page in PageUptodate state, BLKFLSBUF may have flushed the page */ > +static struct page *cramfs_read_getpage(struct address_space *m, unsigned int n) > +{ > + struct page *page; > + int readagain = 5; > +retry: > + page = read_cache_page(m, n, (filler_t *)m->a_ops->readpage, NULL); > + if (IS_ERR(page)) > + return NULL; > + lock_page(page); > + SetPagePrivate(page); > + SetPageChecked(page); > + if (PageUptodate(page)) > + return page; > + cramfs_read_putpage(page); > + printk("cramfs_read_getpage %d %p\n", 5 - readagain + 1, page); > + if (readagain--) > + goto retry; > + return NULL; > +} Oh, OK, we cannot use read_cache_page(). Because the above is still racy (and still needs the retry loop). We need to set PG_checked before launching the read. Something like: /* * Return a locked, uptodate, !PagePrivate, !PageChecked page which needs a * single put_page by the caller. */ cramfs_read_getpage() { page = find_lock_page() if (PageUptodate(page)) return page; SetPagePrivate() SetPageChecked() err = m->a_ops->readpage(page); if (err) { ClearPageChecked(page); ClearPagePrivate(page); put_page(page); return NULL; } lock_page(page); ClearPageChecked(page); ClearPagePrivate(page); if (page->mapping == NULL) { /* truncate got there first */ unlock_page(page); put_page(page); return NULL; } if (PageError(page)) { /* IO error */ unlock_page(page); put_page(page); return NULL; } return page; /* It's locked */ You'll see that the timing window here for invalidate_inode_pages() is very small - invalidate has to come in at the exact right time after the page has come unlocked so that its trylock wins the race against this function. So I'd suggest that to get the cramfs_releasepage() code path tested you'd need to do: wait_on_page_locked(); msleep(1); lock_page() above, to give invalidate a wider window to hit. > + if (blocknr + i < devsize) { > + page = cramfs_read_getpage(mapping, blocknr + i); > + if (page) { > + memcpy(data, kmap_atomic(page, KM_USER0), PAGE_CACHE_SIZE); > + kunmap_atomic(page, KM_USER0); kunmap_atomic() takes the virtual address, not a page* void *kaddr = kmap_atomic(...); memcpy(..., kaddr, ...); kunmap_atomic(kaddr); As to why it's leaking: dunno, sorry. It looks OK. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-02 19:41 ` Andrew Morton @ 2006-06-02 21:06 ` Olaf Hering 0 siblings, 0 replies; 23+ messages in thread From: Olaf Hering @ 2006-06-02 21:06 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, viro On Fri, Jun 02, Andrew Morton wrote: > Oh, OK, we cannot use read_cache_page(). Because the above is still racy > (and still needs the retry loop). We need to set PG_checked before > launching the read. Something like: > > /* > * Return a locked, uptodate, !PagePrivate, !PageChecked page which needs a > * single put_page by the caller. > */ > cramfs_read_getpage() > { > page = find_lock_page() > if (PageUptodate(page)) > return page; > SetPagePrivate() > SetPageChecked() > err = m->a_ops->readpage(page); I guess we have to put a find_or_create_page in there, and fill it. Will try that tomorrow. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-02 9:11 ` Andrew Morton 2006-06-02 19:14 ` Olaf Hering @ 2006-06-02 19:37 ` Olaf Hering 2006-06-02 19:46 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-06-02 19:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, viro On Fri, Jun 02, Andrew Morton wrote: > On Fri, 2 Jun 2006 10:43:27 +0200 > Olaf Hering <olh@suse.de> wrote: > > > On Thu, Jun 01, Andrew Morton wrote: > > > > > > > > Do you want it like that? > > > > > > > > lock_page(page); > > > > if (PageUptodate(page)) { > > > > SetPageDirty(page); > > > > mb(); > > > > return page; > > > > } > > > > > > Not really ;) It's hacky. It'd be better to take a lock. > > > > Which lock exactly? > > Ah, sorry, there isn't such a lock. I was just carrying on. > > > I'm not sure how to proceed from here. > > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked > page and implement a_ops.releasepage(), which will fail if PageChecked(), > and will succeed otherwise: No leak without tweaking PG_private. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-02 19:37 ` Olaf Hering @ 2006-06-02 19:46 ` Andrew Morton 2006-06-03 13:13 ` Olaf Hering 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2006-06-02 19:46 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel, viro On Fri, 2 Jun 2006 21:37:02 +0200 Olaf Hering <olh@suse.de> wrote: > > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked > > page and implement a_ops.releasepage(), which will fail if PageChecked(), > > and will succeed otherwise: > > No leak without tweaking PG_private. Odd. That would imply that PG_private is being left set somehow (it will make the page unreclaimable). But I don't see it. Plus if we have lots of PagePrivate() pages floating about you should see your ->releasepage() being called. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-02 19:46 ` Andrew Morton @ 2006-06-03 13:13 ` Olaf Hering 0 siblings, 0 replies; 23+ messages in thread From: Olaf Hering @ 2006-06-03 13:13 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, viro On Fri, Jun 02, Andrew Morton wrote: > On Fri, 2 Jun 2006 21:37:02 +0200 > Olaf Hering <olh@suse.de> wrote: > > > > I'd suggest you run SetPagePrivate() and SetPageChecked() on the locked > > > page and implement a_ops.releasepage(), which will fail if PageChecked(), > > > and will succeed otherwise: > > > > No leak without tweaking PG_private. > > Odd. That would imply that PG_private is being left set somehow (it will > make the page unreclaimable). But I don't see it. > > Plus if we have lots of PagePrivate() pages floating about you should see > your ->releasepage() being called. The change from Chris leaks as well, but very very slowly. Please wait while I dig into the page states... ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 18:49 ` [PATCH] " Olaf Hering 2006-06-01 19:12 ` Andrew Morton @ 2006-06-01 20:17 ` Chris Mason 2006-06-01 20:20 ` Olaf Hering 2006-09-20 13:20 ` Olaf Hering 2 siblings, 1 reply; 23+ messages in thread From: Chris Mason @ 2006-06-01 20:17 UTC (permalink / raw) To: Olaf Hering; +Cc: Andrew Morton, linux-kernel, Al Viro On Thursday 01 June 2006 14:49, Olaf Hering wrote: > This script will cause cramfs decompression errors, on SMP at least: > > #!/bin/bash > while :;do blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& > while :;do ps faxs </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do dmesg </dev/null &>/dev/null&done </dev/null &>/dev/null& > while :;do find /mounts/instsys -type f -print0|xargs -0 cat > &>/dev/null;done It looks as though: * cramfs_readpage is synchronous * cramfs_readpage always returns an up to date page * cramfs data doesn't change * read_cache_page as called by cramfs will always return a page that was up to date at one time, or an error. I think this will work (but have not tested it). Another option is to create a read_cache_page that pins the page via a page flag that invalidate_mapping_pages will honor. -chris diff -r a1a07af2d0cd fs/cramfs/inode.c --- a/fs/cramfs/inode.c Thu Jun 01 10:45:04 2006 -0400 +++ b/fs/cramfs/inode.c Thu Jun 01 15:04:39 2006 -0400 @@ -190,18 +190,6 @@ static void *cramfs_read(struct super_bl pages[i] = page; } - for (i = 0; i < BLKS_PER_BUF; i++) { - struct page *page = pages[i]; - if (page) { - wait_on_page_locked(page); - if (!PageUptodate(page)) { - /* asynchronous error */ - page_cache_release(page); - pages[i] = NULL; - } - } - } - buffer = next_buffer; next_buffer = NEXT_BUFFER(buffer); buffer_blocknr[buffer] = blocknr; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 20:17 ` Chris Mason @ 2006-06-01 20:20 ` Olaf Hering 2006-06-01 20:29 ` Chris Mason 0 siblings, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-06-01 20:20 UTC (permalink / raw) To: Chris Mason; +Cc: Andrew Morton, linux-kernel, Al Viro On Thu, Jun 01, Chris Mason wrote: > I think this will work (but have not tested it). Another option is to create > a read_cache_page that pins the page via a page flag > that invalidate_mapping_pages will honor. PageLocked or PageDirty, the latter only with a mb(). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 20:20 ` Olaf Hering @ 2006-06-01 20:29 ` Chris Mason 0 siblings, 0 replies; 23+ messages in thread From: Chris Mason @ 2006-06-01 20:29 UTC (permalink / raw) To: Olaf Hering; +Cc: Andrew Morton, linux-kernel, Al Viro On Thursday 01 June 2006 16:20, Olaf Hering wrote: > On Thu, Jun 01, Chris Mason wrote: > > I think this will work (but have not tested it). Another option is to > > create a read_cache_page that pins the page via a page flag > > that invalidate_mapping_pages will honor. > > PageLocked or PageDirty, the latter only with a mb(). The problem is we need the bit to be set before we set the page up to date. A locked page will never make it through the readpage() mechanisms and a dirty page that isn't up to date is not quite legal. For cramfs, dirty would work I suppose. -chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-06-01 18:49 ` [PATCH] " Olaf Hering 2006-06-01 19:12 ` Andrew Morton 2006-06-01 20:17 ` Chris Mason @ 2006-09-20 13:20 ` Olaf Hering 2006-09-20 18:47 ` Andrew Morton 2 siblings, 1 reply; 23+ messages in thread From: Olaf Hering @ 2006-09-20 13:20 UTC (permalink / raw) To: Andrew Morton, linux-kernel, Al Viro On Thu, Jun 01, Olaf Hering wrote: > ... > Error -3 while decompressing! > c0000000009592a2(2649)->c0000000edf87000(4096) > Error -3 while decompressing! > c000000000959298(2520)->c0000000edbc7000(4096) > Error -3 while decompressing! > c000000000959c70(2489)->c0000000f1482000(4096) > Error -3 while decompressing! > c00000000095a629(2355)->c0000000edaff000(4096) > Error -3 while decompressing! > ... Today I looked at this bug again and found that 2.6.18-rc6-git2 has fix for this. Is the patch below supposed to fix the cramfs corruption or does it just paper over the bug? ... cramfs_read() clears parts of the src buffer because the page is not uptodate. invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set ClearPageUptodate() after cramfs_read() got the page from read_cache_page() ... /root/cramfscrash.sh #!/bin/bash # cd /dev/shm/ # tar xfz /mounts/mirror/kernel/v2.6/linux-2.6.18.tar.gz # cd linux-2.6.18/ # mkfs.cramfs drivers /tmp/cramfs.image mount -vnt proc proc /proc echo 1 > /proc/sys/kernel/panic echo 9 > /proc/sysrq-trigger mount -vnt sysfs sysfs /sys modprobe -v loop mount -vnt cramfs -o loop /tmp/cramfs.image /mnt while :;do /sbin/blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& while :;do /usr/bin/find /mnt -type f -print0|xargs -0 cat &>/dev/null;done kernel cmdline xmon=off panic=1 sysrq=1 quiet root=/dev/disk/by-uuid/d50e4029-2e91-4332-bb16-24f946a74d3f ro init=/root/cramfscrash.sh 016eb4a0ed06a3677d67a584da901f0e9a63c666.patch From: Andrew Morton <akpm@osdl.org> If a CPU faults this page into pagetables after invalidate_mapping_pages() checked page_mapped(), invalidate_complete_page() will still proceed to remove the page from pagecache. This leaves the page-faulting process with a detached page. If it was MAP_SHARED then file data loss will ensue. Fix that up by checking the page's refcount after taking tree_lock. Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Hugh Dickins <hugh@veritas.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> --- mm/truncate.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff -puN mm/truncate.c~invalidate_complete_page-race-fix mm/truncate.c --- a/mm/truncate.c~invalidate_complete_page-race-fix +++ a/mm/truncate.c @@ -68,10 +68,10 @@ invalidate_complete_page(struct address_ return 0; write_lock_irq(&mapping->tree_lock); - if (PageDirty(page)) { - write_unlock_irq(&mapping->tree_lock); - return 0; - } + if (PageDirty(page)) + goto failed; + if (page_count(page) != 2) /* caller's ref + pagecache ref */ + goto failed; BUG_ON(PagePrivate(page)); __remove_from_page_cache(page); @@ -79,6 +79,9 @@ invalidate_complete_page(struct address_ ClearPageUptodate(page); page_cache_release(page); /* pagecache ref */ return 1; +failed: + write_unlock_irq(&mapping->tree_lock); + return 0; } /** _ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cramfs corruption after BLKFLSBUF on loop device 2006-09-20 13:20 ` Olaf Hering @ 2006-09-20 18:47 ` Andrew Morton 0 siblings, 0 replies; 23+ messages in thread From: Andrew Morton @ 2006-09-20 18:47 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel, Al Viro On Wed, 20 Sep 2006 15:20:11 +0200 Olaf Hering <olh@suse.de> wrote: > On Thu, Jun 01, Olaf Hering wrote: > > > ... > > Error -3 while decompressing! > > c0000000009592a2(2649)->c0000000edf87000(4096) > > Error -3 while decompressing! > > c000000000959298(2520)->c0000000edbc7000(4096) > > Error -3 while decompressing! > > c000000000959c70(2489)->c0000000f1482000(4096) > > Error -3 while decompressing! > > c00000000095a629(2355)->c0000000edaff000(4096) > > Error -3 while decompressing! > > ... > > Today I looked at this bug again and found that 2.6.18-rc6-git2 has > fix for this. Is the patch below supposed to fix the cramfs corruption > or does it just paper over the bug? > > ... > cramfs_read() clears parts of the src buffer because the page is not > uptodate. invalidate_bdev() called from block_ioctl(BLKFLSBUF) will set > ClearPageUptodate() after cramfs_read() got the page from read_cache_page() > ... > > /root/cramfscrash.sh > #!/bin/bash > # cd /dev/shm/ > # tar xfz /mounts/mirror/kernel/v2.6/linux-2.6.18.tar.gz > # cd linux-2.6.18/ > # mkfs.cramfs drivers /tmp/cramfs.image > mount -vnt proc proc /proc > echo 1 > /proc/sys/kernel/panic > echo 9 > /proc/sysrq-trigger That's a novel way of setting the log level. > mount -vnt sysfs sysfs /sys > modprobe -v loop > mount -vnt cramfs -o loop /tmp/cramfs.image /mnt > while :;do /sbin/blockdev --flushbufs /dev/loop0;done </dev/null &>/dev/null& > while :;do /usr/bin/find /mnt -type f -print0|xargs -0 cat &>/dev/null;done > > > kernel cmdline > xmon=off panic=1 sysrq=1 quiet root=/dev/disk/by-uuid/d50e4029-2e91-4332-bb16-24f946a74d3f ro init=/root/cramfscrash.sh > > > > 016eb4a0ed06a3677d67a584da901f0e9a63c666.patch > From: Andrew Morton <akpm@osdl.org> > > If a CPU faults this page into pagetables after invalidate_mapping_pages() > checked page_mapped(), invalidate_complete_page() will still proceed to remove > the page from pagecache. This leaves the page-faulting process with a > detached page. If it was MAP_SHARED then file data loss will ensue. > > Fix that up by checking the page's refcount after taking tree_lock. Yes, I think this is a reasonable solution to the cramfs race. Previously, invalidate_inode_pages() was removing pages from pagecache even if some other thread had a ref against them, which is pretty bad behaviour. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2006-09-20 18:47 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-05-29 21:40 cramfs corruption after BLKFLSBUF on loop device Olaf Hering 2006-05-30 13:19 ` Olaf Hering 2006-05-30 18:24 ` Olaf Hering 2006-06-01 18:49 ` [PATCH] " Olaf Hering 2006-06-01 19:12 ` Andrew Morton 2006-06-01 19:15 ` Andrew Morton 2006-06-01 20:10 ` Olaf Hering 2006-06-01 21:24 ` Andrew Morton 2006-06-01 21:41 ` Olaf Hering 2006-06-01 21:57 ` Andrew Morton 2006-06-02 8:43 ` Olaf Hering 2006-06-02 9:11 ` Andrew Morton 2006-06-02 19:14 ` Olaf Hering 2006-06-02 19:41 ` Andrew Morton 2006-06-02 21:06 ` Olaf Hering 2006-06-02 19:37 ` Olaf Hering 2006-06-02 19:46 ` Andrew Morton 2006-06-03 13:13 ` Olaf Hering 2006-06-01 20:17 ` Chris Mason 2006-06-01 20:20 ` Olaf Hering 2006-06-01 20:29 ` Chris Mason 2006-09-20 13:20 ` Olaf Hering 2006-09-20 18:47 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox