* Announce: new-aops-1 for 2.6.21-rc3
@ 2007-03-15 16:17 Nick Piggin
2007-03-15 19:32 ` Joel Becker
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Nick Piggin @ 2007-03-15 16:17 UTC (permalink / raw)
To: Linux Filesystems, Mark Fasheh
Cc: reiserfs-list, linux-ext4, xfs, nfs, cluster-devel,
jfs-discussion
OK, I've gone through and fixed several bugs until the thing actually
survives fsx-linux for both ext2 and ext3 ordered and writeback (both
when using the new aops, and the legacy prepare_write path). Actually
ext3 sometimes breaks, but it does in unpatched kernels anyway.
At 15 patches (including the initial buffered write deadlock fixes),
it is too much to keep posting -- not much has fundamentally changed,
so I'll just post occasionally if we make big changes. The quilt
format is probably easier for someone wishing to work on it anyway.
http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/
(excludes the OCFS2 patch that Mark sent, in anticipation of an update)
It would be really nice if filesystem developers could take a look
at the new interfaces some time, because otherwise they might get stuck
with it :) So I'm cc'ing a few filesystems that come to mind, that I
haven't heard anything from.
Thanks,
Nick
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin @ 2007-03-15 19:32 ` Joel Becker 2007-03-15 19:57 ` Nick Piggin 2007-03-15 19:53 ` Mark Fasheh ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Joel Becker @ 2007-03-15 19:32 UTC (permalink / raw) To: Nick Piggin Cc: jfs-discussion, Mark Fasheh, xfs, cluster-devel, reiserfs-list, nfs, Linux Filesystems, linux-ext4 On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > At 15 patches (including the initial buffered write deadlock fixes), > it is too much to keep posting -- not much has fundamentally changed, > so I'll just post occasionally if we make big changes. The quilt > format is probably easier for someone wishing to work on it anyway. > > http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/ For future drops, can you provide the unpacked patches too, so lazy people like me can read them in the browser? Thanks. Joel -- "Here's something to think about: How come you never see a headline like ``Psychic Wins Lottery''?" - Jay Leno Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 19:32 ` Joel Becker @ 2007-03-15 19:57 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2007-03-15 19:57 UTC (permalink / raw) To: Joel Becker Cc: Linux Filesystems, Mark Fasheh, reiserfs-list, linux-ext4, xfs, nfs, cluster-devel, jfs-discussion On Thu, Mar 15, 2007 at 12:32:45PM -0700, Joel Becker wrote: > On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > > At 15 patches (including the initial buffered write deadlock fixes), > > it is too much to keep posting -- not much has fundamentally changed, > > so I'll just post occasionally if we make big changes. The quilt > > format is probably easier for someone wishing to work on it anyway. > > > > http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/ > > For future drops, can you provide the unpacked patches too, so > lazy people like me can read them in the browser? Thanks. Sorry, I did intend to unpack that, but forgot. It's done now, the new directory containing the patches is under the same URL as above. Thanks, Nick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin 2007-03-15 19:32 ` Joel Becker @ 2007-03-15 19:53 ` Mark Fasheh 2007-03-15 19:57 ` Nick Piggin 2007-03-15 21:08 ` Mark Fasheh 2007-03-15 23:47 ` Mark Fasheh 3 siblings, 1 reply; 10+ messages in thread From: Mark Fasheh @ 2007-03-15 19:53 UTC (permalink / raw) To: Nick Piggin Cc: jfs-discussion, xfs, cluster-devel, reiserfs-list, nfs, Linux Filesystems, linux-ext4 On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > OK, I've gone through and fixed several bugs until the thing actually > survives fsx-linux for both ext2 and ext3 ordered and writeback (both > when using the new aops, and the legacy prepare_write path). Actually > ext3 sometimes breaks, but it does in unpatched kernels anyway. > > At 15 patches (including the initial buffered write deadlock fixes), > it is too much to keep posting -- not much has fundamentally changed, > so I'll just post occasionally if we make big changes. The quilt > format is probably easier for someone wishing to work on it anyway. Hmm, we still left out some exports... --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh@oracle.com From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Export simple_write_begin, simple_write_end These are used by configfs, which can be built as a module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> --- fs/libfs.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) 36f5d6a135c9f3f30fee3d0e4ffa887e1803ac95 diff --git a/fs/libfs.c b/fs/libfs.c index d687819..51f9748 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -656,6 +656,8 @@ EXPORT_SYMBOL(dcache_dir_open); EXPORT_SYMBOL(dcache_readdir); EXPORT_SYMBOL(generic_read_dir); EXPORT_SYMBOL(get_sb_pseudo); +EXPORT_SYMBOL(simple_write_begin); +EXPORT_SYMBOL(simple_write_end); EXPORT_SYMBOL(simple_commit_write); EXPORT_SYMBOL(simple_dir_inode_operations); EXPORT_SYMBOL(simple_dir_operations); -- 1.3.3 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 19:53 ` Mark Fasheh @ 2007-03-15 19:57 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2007-03-15 19:57 UTC (permalink / raw) To: Mark Fasheh Cc: jfs-discussion, xfs, cluster-devel, reiserfs-list, nfs, Linux Filesystems, linux-ext4 On Thu, Mar 15, 2007 at 12:53:51PM -0700, Mark Fasheh wrote: > On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > > OK, I've gone through and fixed several bugs until the thing actually > > survives fsx-linux for both ext2 and ext3 ordered and writeback (both > > when using the new aops, and the legacy prepare_write path). Actually > > ext3 sometimes breaks, but it does in unpatched kernels anyway. > > > > At 15 patches (including the initial buffered write deadlock fixes), > > it is too much to keep posting -- not much has fundamentally changed, > > so I'll just post occasionally if we make big changes. The quilt > > format is probably easier for someone wishing to work on it anyway. > > Hmm, we still left out some exports... Thanks, applied. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin 2007-03-15 19:32 ` Joel Becker 2007-03-15 19:53 ` Mark Fasheh @ 2007-03-15 21:08 ` Mark Fasheh 2007-03-15 23:47 ` Mark Fasheh 3 siblings, 0 replies; 10+ messages in thread From: Mark Fasheh @ 2007-03-15 21:08 UTC (permalink / raw) To: Nick Piggin Cc: jfs-discussion, xfs, cluster-devel, reiserfs-list, nfs, Linux Filesystems, linux-ext4 On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > OK, I've gone through and fixed several bugs until the thing actually > survives fsx-linux for both ext2 and ext3 ordered and writeback (both > when using the new aops, and the legacy prepare_write path). Actually > ext3 sometimes breaks, but it does in unpatched kernels anyway. Attached is a bugfix for a crash folks who use an initrd will hit early on. --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh@oracle.com From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Populate pagep in simple_write_begin() This wasn't getting passed back to callers. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> cbf20bf51ddd6434db935ba29f845a85f3b1ec65 diff --git a/fs/libfs.c b/fs/libfs.c index 51f9748..602496a 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -357,6 +357,8 @@ int simple_write_begin(struct file *file if (!page) return -ENOMEM; + *pagep = page; + return simple_prepare_write(file, page, from, from+len); } -- 1.3.3 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin ` (2 preceding siblings ...) 2007-03-15 21:08 ` Mark Fasheh @ 2007-03-15 23:47 ` Mark Fasheh 2007-03-20 5:36 ` Nick Piggin 2007-04-01 5:42 ` Nick Piggin 3 siblings, 2 replies; 10+ messages in thread From: Mark Fasheh @ 2007-03-15 23:47 UTC (permalink / raw) To: Nick Piggin Cc: jfs-discussion, xfs, cluster-devel, reiserfs-list, nfs, Linux Filesystems, linux-ext4 [-- Attachment #1: Type: text/plain, Size: 464 bytes --] On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > (excludes the OCFS2 patch that Mark sent, in anticipation of an update) Attached is said patch. I needed to export __grab_cache_page (ext2/ext3 also need this if they're to be built as modules), so a patch to do that is also attached. This passed some preliminary testing on a two node cluster I have here at Oracle. --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh@oracle.com [-- Attachment #2: 0001-ocfs2-Convert-to-new-aops.txt --] [-- Type: text/plain, Size: 5478 bytes --] From: Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Convert to new aops Turn ocfs2_prepare_write() and ocfs2_commit_write() into ocfs2_write_begin() and ocfs2_write_end(). This conveniently eliminates the need for AOP_TRUNCATED_PAGE during write. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> e28911070b02362a9a3a543646da84a8fbf9f63b diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 875c114..cbec0e1 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -293,29 +293,67 @@ int ocfs2_prepare_write_nolock(struct in } /* - * ocfs2_prepare_write() can be an outer-most ocfs2 call when it is called - * from loopback. It must be able to perform its own locking around - * ocfs2_get_block(). + * ocfs2_write_begin() can be an outer-most ocfs2 call when it is + * called from elsewhere in the kernel. It must be able to perform its + * own locking around ocfs2_get_block(). */ -static int ocfs2_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int ocfs2_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - struct inode *inode = page->mapping->host; + struct inode *inode = mapping->host; + struct buffer_head *di_bh = NULL; + struct page *page = NULL; int ret; - mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to); - - ret = ocfs2_meta_lock_with_page(inode, NULL, 0, page); + ret = ocfs2_meta_lock(inode, &di_bh, 1); if (ret != 0) { mlog_errno(ret); + return ret; + } + + ret = ocfs2_data_lock(inode, 1); + if (ret) { + ocfs2_meta_unlock(inode, 1); + + mlog_errno(ret); + return ret; + } + + /* + * Lock the page out here to preserve ordering with + * ip_alloc_sem. + */ + page = __grab_cache_page(mapping, pos >> PAGE_CACHE_SHIFT); + if (!page) { + ret = -ENOMEM; + mlog_errno(ret); goto out; } - ret = ocfs2_prepare_write_nolock(inode, page, from, to); + *pagep = page; - ocfs2_meta_unlock(inode, 0); + down_read(&OCFS2_I(inode)->ip_alloc_sem); + ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + ocfs2_get_block); + up_read(&OCFS2_I(inode)->ip_alloc_sem); out: - mlog_exit(ret); + if (ret == 0) { + *fsdata = di_bh; + } else { + /* + * Error return - the caller won't call + * ocfs2_write_end, so drop cluster locks here. + */ + brelse(di_bh); + if (page) { + unlock_page(page); + page_cache_release(page); + } + ocfs2_data_unlock(inode, 1); + ocfs2_meta_unlock(inode, 1); + } + return ret; } @@ -388,16 +426,18 @@ out: return handle; } -static int ocfs2_commit_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int ocfs2_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) { int ret; - struct buffer_head *di_bh = NULL; + unsigned from, to; + struct buffer_head *di_bh = fsdata; struct inode *inode = page->mapping->host; handle_t *handle = NULL; struct ocfs2_dinode *di; - mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to); + mlog_entry("(0x%p, 0x%p)\n", file, page); /* NOTE: ocfs2_file_aio_write has ensured that it's safe for * us to continue here without rechecking the I/O against @@ -412,22 +452,13 @@ static int ocfs2_commit_write(struct fil * stale inode allocation image (i_size, i_clusters, etc). */ - ret = ocfs2_meta_lock_with_page(inode, &di_bh, 1, page); - if (ret != 0) { - mlog_errno(ret); - goto out; - } - - ret = ocfs2_data_lock_with_page(inode, 1, page); - if (ret != 0) { - mlog_errno(ret); - goto out_unlock_meta; - } + from = pos & (PAGE_CACHE_SIZE - 1); + to = from + len; handle = ocfs2_start_walk_page_trans(inode, page, from, to); if (IS_ERR(handle)) { ret = PTR_ERR(handle); - goto out_unlock_data; + goto out_unlock; } /* Mark our buffer early. We'd rather catch this error up here @@ -441,8 +472,10 @@ static int ocfs2_commit_write(struct fil } /* might update i_size */ - ret = generic_commit_write(file, page, from, to); - if (ret < 0) { + copied = block_write_end(file, mapping, pos, len, copied, page, fsdata); + if (copied < 0) { + ret = copied; + copied = 0; mlog_errno(ret); goto out_commit; } @@ -458,23 +491,30 @@ static int ocfs2_commit_write(struct fil di->i_size = cpu_to_le64((u64)i_size_read(inode)); ret = ocfs2_journal_dirty(handle, di_bh); - if (ret < 0) { + if (ret < 0) mlog_errno(ret); - goto out_commit; - } + ret = 0; out_commit: ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle); -out_unlock_data: +out_unlock: ocfs2_data_unlock(inode, 1); -out_unlock_meta: ocfs2_meta_unlock(inode, 1); -out: + + if (ret) { + /* + * We caught an error before block_write_end() - + * unlock and free the page. + */ + unlock_page(page); + page_cache_release(page); + } + if (di_bh) brelse(di_bh); mlog_exit(ret); - return ret; + return copied ? copied : ret; } static sector_t ocfs2_bmap(struct address_space *mapping, sector_t block) @@ -678,8 +718,8 @@ out: const struct address_space_operations ocfs2_aops = { .readpage = ocfs2_readpage, .writepage = ocfs2_writepage, - .prepare_write = ocfs2_prepare_write, - .commit_write = ocfs2_commit_write, + .write_begin = ocfs2_write_begin, + .write_end = ocfs2_write_end, .bmap = ocfs2_bmap, .sync_page = block_sync_page, .direct_IO = ocfs2_direct_IO, -- 1.3.3 [-- Attachment #3: 0002-Export-__grab_cache_page.txt --] [-- Type: text/plain, Size: 521 bytes --] From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Export __grab_cache_page Needed at least by ocfs2 and ext[23]. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> ec4c66f0e6012a182105405aa11813fbf836629f diff --git a/mm/filemap.c b/mm/filemap.c index 327c20f..c4a2d68 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2196,6 +2196,7 @@ repeat: } return page; } +EXPORT_SYMBOL(__grab_cache_page); static ssize_t generic_perform_write_2copy(struct file *file, struct iov_iter *i, loff_t pos) -- 1.3.3 [-- Attachment #4: Type: text/plain, Size: 345 bytes --] ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV [-- Attachment #5: Type: text/plain, Size: 140 bytes --] _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 23:47 ` Mark Fasheh @ 2007-03-20 5:36 ` Nick Piggin 2007-04-01 5:42 ` Nick Piggin 1 sibling, 0 replies; 10+ messages in thread From: Nick Piggin @ 2007-03-20 5:36 UTC (permalink / raw) To: Mark Fasheh Cc: Linux Filesystems, reiserfs-list, linux-ext4, xfs, nfs, cluster-devel, jfs-discussion On Thu, Mar 15, 2007 at 04:47:13PM -0700, Mark Fasheh wrote: > On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > > (excludes the OCFS2 patch that Mark sent, in anticipation of an update) > > Attached is said patch. I needed to export __grab_cache_page (ext2/ext3 also > need this if they're to be built as modules), so a patch to do that is also > attached. > > This passed some preliminary testing on a two node cluster I have here at > Oracle. Thanks Mark, I've merged these. Nick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 23:47 ` Mark Fasheh 2007-03-20 5:36 ` Nick Piggin @ 2007-04-01 5:42 ` Nick Piggin 2007-04-02 2:14 ` Nick Piggin 1 sibling, 1 reply; 10+ messages in thread From: Nick Piggin @ 2007-04-01 5:42 UTC (permalink / raw) To: Mark Fasheh; +Cc: Linux Filesystems, Steven Whitehouse Hi, Progress is coming along: as well as the several fixes Mark has posted, we've got a GFS2 patch from Steve as well, which also enables us to get rid of the AOPS_TRUNCATED_PAGE handling in the legacy ->prepare_write paths. And I've been auditing code and doing stress and code coverage tests for various error paths... I was hoping to have a cont_prepare_write filesystem converted before doing a next patchset, but there are enough fixes and new stuff now that I'll probably do one tomorrow. Anyway my recent testing involved things like poisioning the memory of new buffers, and injecting failures into various places (eg. the atomic usercopies), then running fsx-linux and stuff on top of ext2 with various block sizes. This found one subtle problem (the first, below), and inspection found a couple of others. With these fixes, the stress tests now survive as long as I've cared to wait. Any comments always appreciated. Thanks, Nick -- - The first hunk fixes a case where new buffers that were only partially cleared (because the write partially overlaps a buffer, so that region is not cleared), and then the write fails before filling up this portion of the buffer. Because we have already cleared buffer_new, page_zero_new_buffers does not clear it properly. This results in uninitialised data going onto disk (and in pagecache). The solution is to only clear_buffer_new when we know everything has been initialised, which makes sense. - The second hunk fixes a thinko. - The last hunk fixes a problem where we turn any short write to a !uptodate page into a zero-length write, for the reason explained in comments. However it was not taking this contraction into account in page_zero_new_buffers and __block_commit_write, which is inconsistent. Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1852,17 +1852,8 @@ static int __block_prepare_write(struct if (!buffer_uptodate(*wait_bh)) err = -EIO; } - if (!err) { - bh = head; - do { - if (buffer_new(bh)) - clear_buffer_new(bh); - } while ((bh = bh->b_this_page) != head); - return 0; - } - - /* Error case: */ - page_zero_new_buffers(page, from, to); + if (unlikely(err)) + page_zero_new_buffers(page, from, to); return err; } @@ -1895,7 +1886,7 @@ void page_zero_new_buffers(struct page * end = min(to, block_end); kaddr = kmap_atomic(page, KM_USER0); - memset(kaddr+start, 0, block_end-end); + memset(kaddr+start, 0, end - start); flush_dcache_page(page); kunmap_atomic(kaddr, KM_USER0); set_buffer_uptodate(bh); @@ -2015,30 +2006,29 @@ int block_write_end(struct file *file, s start = pos & (PAGE_CACHE_SIZE - 1); - if (unlikely(copied < len)) + if (unlikely(copied < len)) { + /* + * The buffers that were written will now be uptodate, so we + * don't have to worry about a readpage reading them and + * overwriting a partial write. However if we have encountered + * a short write and only partially written into a buffer, it + * will not be marked uptodate, so a readpage might come in and + * destroy our partial write. + * + * Do the simplest thing, and just treat any short write to a + * non uptodate page as a zero-length write, and force the + * caller to redo the whole thing. + */ + if (!PageUptodate(page)) + copied = 0; + page_zero_new_buffers(page, start+copied, start+len); + } flush_dcache_page(page); /* This could be a short (even 0-length) commit */ __block_commit_write(inode, page, start, start+copied); - /* - * The buffers that were written will now be uptodate, so we don't - * have to worry about a readpage reading them and overwriting - * a partial write. However if we have encountered a short write - * and only partially written into a buffer, it will not be marked - * uptodate, so a readpage might come in and destroy our partial - * write. - * - * Do the simplest thing, and just treat any short write to a non - * uptodate page as a zero-length write, and force the caller to - * redo the whole thing. - */ - if (unlikely(copied < len)) { - if (!PageUptodate(page)) - copied = 0; - } - unlock_page(page); mark_page_accessed(page); /* XXX: put this in caller? */ page_cache_release(page); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Announce: new-aops-1 for 2.6.21-rc3 2007-04-01 5:42 ` Nick Piggin @ 2007-04-02 2:14 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2007-04-02 2:14 UTC (permalink / raw) To: Mark Fasheh; +Cc: Linux Filesystems, Steven Whitehouse Hmm, spotted another bug in here too, which also affects mainline AFAIKS. If the page is already uptodate, then a new buffer will get marked uptodate but will remain "new"... thus if the read portion of the RMW operation fails, then the error handler will zero this guy out "to prevent stale data exposure". Of course, the data is not stale because it is uptodate, so zeroing out will cause data loss. I fixed it thusly, because I think buffer_new should always be cleared before marking a buffer uptodate. I don't know whether I'll bother submitting this seperately from the new-aops lineup, but if anyone would like to, feel free to add my SOB. -- Index: linux-2.6/fs/buffer.c =================================================================== --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1813,7 +1813,9 @@ static int __block_prepare_write(struct unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr); if (PageUptodate(page)) { + clear_buffer_new(bh); set_buffer_uptodate(bh); + mark_buffer_dirty(bh); continue; } if (block_end > to || block_start < from) { ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-04-02 2:14 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-15 16:17 Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin 2007-03-15 19:32 ` Joel Becker 2007-03-15 19:57 ` Nick Piggin 2007-03-15 19:53 ` Mark Fasheh 2007-03-15 19:57 ` Nick Piggin 2007-03-15 21:08 ` Mark Fasheh 2007-03-15 23:47 ` Mark Fasheh 2007-03-20 5:36 ` Nick Piggin 2007-04-01 5:42 ` Nick Piggin 2007-04-02 2:14 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).