* [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3
@ 2007-03-15 16:17 Nick Piggin
2007-03-15 16:17 ` Nick Piggin
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Nick Piggin @ 2007-03-15 16:17 UTC (permalink / raw)
To: cluster-devel.redhat.com
OK, I've gone through and fixed several bugs until the thing actually
survives fsx-linux for both ext2 and ext3 ordered and writeback (both
when using the new aops, and the legacy prepare_write path). Actually
ext3 sometimes breaks, but it does in unpatched kernels anyway.
At 15 patches (including the initial buffered write deadlock fixes),
it is too much to keep posting -- not much has fundamentally changed,
so I'll just post occasionally if we make big changes. The quilt
format is probably easier for someone wishing to work on it anyway.
http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/
(excludes the OCFS2 patch that Mark sent, in anticipation of an update)
It would be really nice if filesystem developers could take a look
at the new interfaces some time, because otherwise they might get stuck
with it :) So I'm cc'ing a few filesystems that come to mind, that I
haven't heard anything from.
Thanks,
Nick
^ permalink raw reply [flat|nested] 9+ messages in thread* [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin @ 2007-03-15 16:17 ` Nick Piggin 2007-03-15 19:32 ` [Cluster-devel] " Joel Becker ` (3 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Nick Piggin @ 2007-03-15 16:17 UTC (permalink / raw) To: cluster-devel.redhat.com OK, I've gone through and fixed several bugs until the thing actually survives fsx-linux for both ext2 and ext3 ordered and writeback (both when using the new aops, and the legacy prepare_write path). Actually ext3 sometimes breaks, but it does in unpatched kernels anyway. At 15 patches (including the initial buffered write deadlock fixes), it is too much to keep posting -- not much has fundamentally changed, so I'll just post occasionally if we make big changes. The quilt format is probably easier for someone wishing to work on it anyway. http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/ (excludes the OCFS2 patch that Mark sent, in anticipation of an update) It would be really nice if filesystem developers could take a look at the new interfaces some time, because otherwise they might get stuck with it :) So I'm cc'ing a few filesystems that come to mind, that I haven't heard anything from. Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin 2007-03-15 16:17 ` Nick Piggin @ 2007-03-15 19:32 ` Joel Becker 2007-03-15 19:57 ` Nick Piggin 2007-03-15 19:53 ` Mark Fasheh ` (2 subsequent siblings) 4 siblings, 1 reply; 9+ messages in thread From: Joel Becker @ 2007-03-15 19:32 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > At 15 patches (including the initial buffered write deadlock fixes), > it is too much to keep posting -- not much has fundamentally changed, > so I'll just post occasionally if we make big changes. The quilt > format is probably easier for someone wishing to work on it anyway. > > http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/ For future drops, can you provide the unpacked patches too, so lazy people like me can read them in the browser? Thanks. Joel -- "Here's something to think about: How come you never see a headline like ``Psychic Wins Lottery''?" - Jay Leno Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 19:32 ` [Cluster-devel] " Joel Becker @ 2007-03-15 19:57 ` Nick Piggin 0 siblings, 0 replies; 9+ messages in thread From: Nick Piggin @ 2007-03-15 19:57 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Mar 15, 2007 at 12:32:45PM -0700, Joel Becker wrote: > On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > > At 15 patches (including the initial buffered write deadlock fixes), > > it is too much to keep posting -- not much has fundamentally changed, > > so I'll just post occasionally if we make big changes. The quilt > > format is probably easier for someone wishing to work on it anyway. > > > > http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/ > > For future drops, can you provide the unpacked patches too, so > lazy people like me can read them in the browser? Thanks. Sorry, I did intend to unpack that, but forgot. It's done now, the new directory containing the patches is under the same URL as above. Thanks, Nick ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin 2007-03-15 16:17 ` Nick Piggin 2007-03-15 19:32 ` [Cluster-devel] " Joel Becker @ 2007-03-15 19:53 ` Mark Fasheh 2007-03-15 19:57 ` Nick Piggin 2007-03-15 21:08 ` Mark Fasheh 2007-03-15 23:47 ` Mark Fasheh 4 siblings, 1 reply; 9+ messages in thread From: Mark Fasheh @ 2007-03-15 19:53 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > OK, I've gone through and fixed several bugs until the thing actually > survives fsx-linux for both ext2 and ext3 ordered and writeback (both > when using the new aops, and the legacy prepare_write path). Actually > ext3 sometimes breaks, but it does in unpatched kernels anyway. > > At 15 patches (including the initial buffered write deadlock fixes), > it is too much to keep posting -- not much has fundamentally changed, > so I'll just post occasionally if we make big changes. The quilt > format is probably easier for someone wishing to work on it anyway. Hmm, we still left out some exports... --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh at oracle.com From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Export simple_write_begin, simple_write_end These are used by configfs, which can be built as a module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> --- fs/libfs.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) 36f5d6a135c9f3f30fee3d0e4ffa887e1803ac95 diff --git a/fs/libfs.c b/fs/libfs.c index d687819..51f9748 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -656,6 +656,8 @@ EXPORT_SYMBOL(dcache_dir_open); EXPORT_SYMBOL(dcache_readdir); EXPORT_SYMBOL(generic_read_dir); EXPORT_SYMBOL(get_sb_pseudo); +EXPORT_SYMBOL(simple_write_begin); +EXPORT_SYMBOL(simple_write_end); EXPORT_SYMBOL(simple_commit_write); EXPORT_SYMBOL(simple_dir_inode_operations); EXPORT_SYMBOL(simple_dir_operations); -- 1.3.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 19:53 ` Mark Fasheh @ 2007-03-15 19:57 ` Nick Piggin 0 siblings, 0 replies; 9+ messages in thread From: Nick Piggin @ 2007-03-15 19:57 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Mar 15, 2007 at 12:53:51PM -0700, Mark Fasheh wrote: > On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > > OK, I've gone through and fixed several bugs until the thing actually > > survives fsx-linux for both ext2 and ext3 ordered and writeback (both > > when using the new aops, and the legacy prepare_write path). Actually > > ext3 sometimes breaks, but it does in unpatched kernels anyway. > > > > At 15 patches (including the initial buffered write deadlock fixes), > > it is too much to keep posting -- not much has fundamentally changed, > > so I'll just post occasionally if we make big changes. The quilt > > format is probably easier for someone wishing to work on it anyway. > > Hmm, we still left out some exports... Thanks, applied. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin ` (2 preceding siblings ...) 2007-03-15 19:53 ` Mark Fasheh @ 2007-03-15 21:08 ` Mark Fasheh 2007-03-15 23:47 ` Mark Fasheh 4 siblings, 0 replies; 9+ messages in thread From: Mark Fasheh @ 2007-03-15 21:08 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > OK, I've gone through and fixed several bugs until the thing actually > survives fsx-linux for both ext2 and ext3 ordered and writeback (both > when using the new aops, and the legacy prepare_write path). Actually > ext3 sometimes breaks, but it does in unpatched kernels anyway. Attached is a bugfix for a crash folks who use an initrd will hit early on. --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh at oracle.com From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Populate pagep in simple_write_begin() This wasn't getting passed back to callers. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> cbf20bf51ddd6434db935ba29f845a85f3b1ec65 diff --git a/fs/libfs.c b/fs/libfs.c index 51f9748..602496a 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -357,6 +357,8 @@ int simple_write_begin(struct file *file if (!page) return -ENOMEM; + *pagep = page; + return simple_prepare_write(file, page, from, from+len); } -- 1.3.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin ` (3 preceding siblings ...) 2007-03-15 21:08 ` Mark Fasheh @ 2007-03-15 23:47 ` Mark Fasheh 2007-03-20 5:36 ` Nick Piggin 4 siblings, 1 reply; 9+ messages in thread From: Mark Fasheh @ 2007-03-15 23:47 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > (excludes the OCFS2 patch that Mark sent, in anticipation of an update) Attached is said patch. I needed to export __grab_cache_page (ext2/ext3 also need this if they're to be built as modules), so a patch to do that is also attached. This passed some preliminary testing on a two node cluster I have here at Oracle. --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh at oracle.com -------------- next part -------------- From: Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Convert to new aops Turn ocfs2_prepare_write() and ocfs2_commit_write() into ocfs2_write_begin() and ocfs2_write_end(). This conveniently eliminates the need for AOP_TRUNCATED_PAGE during write. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> e28911070b02362a9a3a543646da84a8fbf9f63b diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 875c114..cbec0e1 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -293,29 +293,67 @@ int ocfs2_prepare_write_nolock(struct in } /* - * ocfs2_prepare_write() can be an outer-most ocfs2 call when it is called - * from loopback. It must be able to perform its own locking around - * ocfs2_get_block(). + * ocfs2_write_begin() can be an outer-most ocfs2 call when it is + * called from elsewhere in the kernel. It must be able to perform its + * own locking around ocfs2_get_block(). */ -static int ocfs2_prepare_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int ocfs2_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, + struct page **pagep, void **fsdata) { - struct inode *inode = page->mapping->host; + struct inode *inode = mapping->host; + struct buffer_head *di_bh = NULL; + struct page *page = NULL; int ret; - mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to); - - ret = ocfs2_meta_lock_with_page(inode, NULL, 0, page); + ret = ocfs2_meta_lock(inode, &di_bh, 1); if (ret != 0) { mlog_errno(ret); + return ret; + } + + ret = ocfs2_data_lock(inode, 1); + if (ret) { + ocfs2_meta_unlock(inode, 1); + + mlog_errno(ret); + return ret; + } + + /* + * Lock the page out here to preserve ordering with + * ip_alloc_sem. + */ + page = __grab_cache_page(mapping, pos >> PAGE_CACHE_SHIFT); + if (!page) { + ret = -ENOMEM; + mlog_errno(ret); goto out; } - ret = ocfs2_prepare_write_nolock(inode, page, from, to); + *pagep = page; - ocfs2_meta_unlock(inode, 0); + down_read(&OCFS2_I(inode)->ip_alloc_sem); + ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + ocfs2_get_block); + up_read(&OCFS2_I(inode)->ip_alloc_sem); out: - mlog_exit(ret); + if (ret == 0) { + *fsdata = di_bh; + } else { + /* + * Error return - the caller won't call + * ocfs2_write_end, so drop cluster locks here. + */ + brelse(di_bh); + if (page) { + unlock_page(page); + page_cache_release(page); + } + ocfs2_data_unlock(inode, 1); + ocfs2_meta_unlock(inode, 1); + } + return ret; } @@ -388,16 +426,18 @@ out: return handle; } -static int ocfs2_commit_write(struct file *file, struct page *page, - unsigned from, unsigned to) +static int ocfs2_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) { int ret; - struct buffer_head *di_bh = NULL; + unsigned from, to; + struct buffer_head *di_bh = fsdata; struct inode *inode = page->mapping->host; handle_t *handle = NULL; struct ocfs2_dinode *di; - mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to); + mlog_entry("(0x%p, 0x%p)\n", file, page); /* NOTE: ocfs2_file_aio_write has ensured that it's safe for * us to continue here without rechecking the I/O against @@ -412,22 +452,13 @@ static int ocfs2_commit_write(struct fil * stale inode allocation image (i_size, i_clusters, etc). */ - ret = ocfs2_meta_lock_with_page(inode, &di_bh, 1, page); - if (ret != 0) { - mlog_errno(ret); - goto out; - } - - ret = ocfs2_data_lock_with_page(inode, 1, page); - if (ret != 0) { - mlog_errno(ret); - goto out_unlock_meta; - } + from = pos & (PAGE_CACHE_SIZE - 1); + to = from + len; handle = ocfs2_start_walk_page_trans(inode, page, from, to); if (IS_ERR(handle)) { ret = PTR_ERR(handle); - goto out_unlock_data; + goto out_unlock; } /* Mark our buffer early. We'd rather catch this error up here @@ -441,8 +472,10 @@ static int ocfs2_commit_write(struct fil } /* might update i_size */ - ret = generic_commit_write(file, page, from, to); - if (ret < 0) { + copied = block_write_end(file, mapping, pos, len, copied, page, fsdata); + if (copied < 0) { + ret = copied; + copied = 0; mlog_errno(ret); goto out_commit; } @@ -458,23 +491,30 @@ static int ocfs2_commit_write(struct fil di->i_size = cpu_to_le64((u64)i_size_read(inode)); ret = ocfs2_journal_dirty(handle, di_bh); - if (ret < 0) { + if (ret < 0) mlog_errno(ret); - goto out_commit; - } + ret = 0; out_commit: ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle); -out_unlock_data: +out_unlock: ocfs2_data_unlock(inode, 1); -out_unlock_meta: ocfs2_meta_unlock(inode, 1); -out: + + if (ret) { + /* + * We caught an error before block_write_end() - + * unlock and free the page. + */ + unlock_page(page); + page_cache_release(page); + } + if (di_bh) brelse(di_bh); mlog_exit(ret); - return ret; + return copied ? copied : ret; } static sector_t ocfs2_bmap(struct address_space *mapping, sector_t block) @@ -678,8 +718,8 @@ out: const struct address_space_operations ocfs2_aops = { .readpage = ocfs2_readpage, .writepage = ocfs2_writepage, - .prepare_write = ocfs2_prepare_write, - .commit_write = ocfs2_commit_write, + .write_begin = ocfs2_write_begin, + .write_end = ocfs2_write_end, .bmap = ocfs2_bmap, .sync_page = block_sync_page, .direct_IO = ocfs2_direct_IO, -- 1.3.3 -------------- next part -------------- From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Export __grab_cache_page Needed at least by ocfs2 and ext[23]. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> ec4c66f0e6012a182105405aa11813fbf836629f diff --git a/mm/filemap.c b/mm/filemap.c index 327c20f..c4a2d68 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2196,6 +2196,7 @@ repeat: } return page; } +EXPORT_SYMBOL(__grab_cache_page); static ssize_t generic_perform_write_2copy(struct file *file, struct iov_iter *i, loff_t pos) -- 1.3.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3 2007-03-15 23:47 ` Mark Fasheh @ 2007-03-20 5:36 ` Nick Piggin 0 siblings, 0 replies; 9+ messages in thread From: Nick Piggin @ 2007-03-20 5:36 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Mar 15, 2007 at 04:47:13PM -0700, Mark Fasheh wrote: > On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote: > > (excludes the OCFS2 patch that Mark sent, in anticipation of an update) > > Attached is said patch. I needed to export __grab_cache_page (ext2/ext3 also > need this if they're to be built as modules), so a patch to do that is also > attached. > > This passed some preliminary testing on a two node cluster I have here at > Oracle. Thanks Mark, I've merged these. Nick ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-03-20 5:36 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin 2007-03-15 16:17 ` Nick Piggin 2007-03-15 19:32 ` [Cluster-devel] " Joel Becker 2007-03-15 19:57 ` Nick Piggin 2007-03-15 19:53 ` Mark Fasheh 2007-03-15 19:57 ` Nick Piggin 2007-03-15 21:08 ` Mark Fasheh 2007-03-15 23:47 ` Mark Fasheh 2007-03-20 5:36 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).