cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3
@ 2007-03-15 16:17 Nick Piggin
  2007-03-15 16:17 ` Nick Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Nick Piggin @ 2007-03-15 16:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

OK, I've gone through and fixed several bugs until the thing actually
survives fsx-linux for both ext2 and ext3 ordered and writeback (both
when using the new aops, and the legacy prepare_write path). Actually
ext3 sometimes breaks, but it does in unpatched kernels anyway.

At 15 patches (including the initial buffered write deadlock fixes),
it is too much to keep posting -- not much has fundamentally changed,
so I'll just post occasionally if we make big changes. The quilt
format is probably easier for someone wishing to work on it anyway.

http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/

(excludes the OCFS2 patch that Mark sent, in anticipation of an update)

It would be really nice if filesystem developers could take a look
at the new interfaces some time, because otherwise they might get stuck
with it :) So I'm cc'ing a few filesystems that come to mind, that I 
haven't heard anything from. 

Thanks,
Nick



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin
@ 2007-03-15 16:17 ` Nick Piggin
  2007-03-15 19:32 ` [Cluster-devel] " Joel Becker
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-03-15 16:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

OK, I've gone through and fixed several bugs until the thing actually
survives fsx-linux for both ext2 and ext3 ordered and writeback (both
when using the new aops, and the legacy prepare_write path). Actually
ext3 sometimes breaks, but it does in unpatched kernels anyway.

At 15 patches (including the initial buffered write deadlock fixes),
it is too much to keep posting -- not much has fundamentally changed,
so I'll just post occasionally if we make big changes. The quilt
format is probably easier for someone wishing to work on it anyway.

http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/

(excludes the OCFS2 patch that Mark sent, in anticipation of an update)

It would be really nice if filesystem developers could take a look
at the new interfaces some time, because otherwise they might get stuck
with it :) So I'm cc'ing a few filesystems that come to mind, that I 
haven't heard anything from. 

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin
  2007-03-15 16:17 ` Nick Piggin
@ 2007-03-15 19:32 ` Joel Becker
  2007-03-15 19:57   ` Nick Piggin
  2007-03-15 19:53 ` Mark Fasheh
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Joel Becker @ 2007-03-15 19:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
> At 15 patches (including the initial buffered write deadlock fixes),
> it is too much to keep posting -- not much has fundamentally changed,
> so I'll just post occasionally if we make big changes. The quilt
> format is probably easier for someone wishing to work on it anyway.
> 
> http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/

	For future drops, can you provide the unpacked patches too, so
lazy people like me can read them in the browser?  Thanks.

Joel

-- 

"Here's something to think about:  How come you never see a headline
 like ``Psychic Wins Lottery''?"
	- Jay Leno

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin
  2007-03-15 16:17 ` Nick Piggin
  2007-03-15 19:32 ` [Cluster-devel] " Joel Becker
@ 2007-03-15 19:53 ` Mark Fasheh
  2007-03-15 19:57   ` Nick Piggin
  2007-03-15 21:08 ` Mark Fasheh
  2007-03-15 23:47 ` Mark Fasheh
  4 siblings, 1 reply; 9+ messages in thread
From: Mark Fasheh @ 2007-03-15 19:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
> OK, I've gone through and fixed several bugs until the thing actually
> survives fsx-linux for both ext2 and ext3 ordered and writeback (both
> when using the new aops, and the legacy prepare_write path). Actually
> ext3 sometimes breaks, but it does in unpatched kernels anyway.
> 
> At 15 patches (including the initial buffered write deadlock fixes),
> it is too much to keep posting -- not much has fundamentally changed,
> so I'll just post occasionally if we make big changes. The quilt
> format is probably easier for someone wishing to work on it anyway.

Hmm, we still left out some exports...
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com


From: Mark Fasheh <mark.fasheh@oracle.com>
[PATCH] Export simple_write_begin, simple_write_end

These are used by configfs, which can be built as a module.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

---

 fs/libfs.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

36f5d6a135c9f3f30fee3d0e4ffa887e1803ac95
diff --git a/fs/libfs.c b/fs/libfs.c
index d687819..51f9748 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -656,6 +656,8 @@ EXPORT_SYMBOL(dcache_dir_open);
 EXPORT_SYMBOL(dcache_readdir);
 EXPORT_SYMBOL(generic_read_dir);
 EXPORT_SYMBOL(get_sb_pseudo);
+EXPORT_SYMBOL(simple_write_begin);
+EXPORT_SYMBOL(simple_write_end);
 EXPORT_SYMBOL(simple_commit_write);
 EXPORT_SYMBOL(simple_dir_inode_operations);
 EXPORT_SYMBOL(simple_dir_operations);
-- 
1.3.3



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 19:32 ` [Cluster-devel] " Joel Becker
@ 2007-03-15 19:57   ` Nick Piggin
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-03-15 19:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Mar 15, 2007 at 12:32:45PM -0700, Joel Becker wrote:
> On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
> > At 15 patches (including the initial buffered write deadlock fixes),
> > it is too much to keep posting -- not much has fundamentally changed,
> > so I'll just post occasionally if we make big changes. The quilt
> > format is probably easier for someone wishing to work on it anyway.
> > 
> > http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/
> 
> 	For future drops, can you provide the unpacked patches too, so
> lazy people like me can read them in the browser?  Thanks.

Sorry, I did intend to unpack that, but forgot. It's done now, the
new directory containing the patches is under the same URL as above.

Thanks,
Nick



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 19:53 ` Mark Fasheh
@ 2007-03-15 19:57   ` Nick Piggin
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-03-15 19:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Mar 15, 2007 at 12:53:51PM -0700, Mark Fasheh wrote:
> On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
> > OK, I've gone through and fixed several bugs until the thing actually
> > survives fsx-linux for both ext2 and ext3 ordered and writeback (both
> > when using the new aops, and the legacy prepare_write path). Actually
> > ext3 sometimes breaks, but it does in unpatched kernels anyway.
> > 
> > At 15 patches (including the initial buffered write deadlock fixes),
> > it is too much to keep posting -- not much has fundamentally changed,
> > so I'll just post occasionally if we make big changes. The quilt
> > format is probably easier for someone wishing to work on it anyway.
> 
> Hmm, we still left out some exports...

Thanks, applied.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin
                   ` (2 preceding siblings ...)
  2007-03-15 19:53 ` Mark Fasheh
@ 2007-03-15 21:08 ` Mark Fasheh
  2007-03-15 23:47 ` Mark Fasheh
  4 siblings, 0 replies; 9+ messages in thread
From: Mark Fasheh @ 2007-03-15 21:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
> OK, I've gone through and fixed several bugs until the thing actually
> survives fsx-linux for both ext2 and ext3 ordered and writeback (both
> when using the new aops, and the legacy prepare_write path). Actually
> ext3 sometimes breaks, but it does in unpatched kernels anyway.

Attached is a bugfix for a crash folks who use an initrd will hit early on.
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com


From: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] Populate pagep in simple_write_begin()

This wasn't getting passed back to callers.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

cbf20bf51ddd6434db935ba29f845a85f3b1ec65
diff --git a/fs/libfs.c b/fs/libfs.c
index 51f9748..602496a 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -357,6 +357,8 @@ int simple_write_begin(struct file *file
 	if (!page)
 		return -ENOMEM;
 
+	*pagep = page;
+
 	return simple_prepare_write(file, page, from, from+len);
 }
 
-- 
1.3.3



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin
                   ` (3 preceding siblings ...)
  2007-03-15 21:08 ` Mark Fasheh
@ 2007-03-15 23:47 ` Mark Fasheh
  2007-03-20  5:36   ` Nick Piggin
  4 siblings, 1 reply; 9+ messages in thread
From: Mark Fasheh @ 2007-03-15 23:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
> (excludes the OCFS2 patch that Mark sent, in anticipation of an update)

Attached is said patch. I needed to export __grab_cache_page (ext2/ext3 also
need this if they're to be built as modules), so a patch to do that is also
attached.

This passed some preliminary testing on a two node cluster I have here at
Oracle.
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com

-------------- next part --------------
From: Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Convert to new aops

Turn ocfs2_prepare_write() and ocfs2_commit_write() into ocfs2_write_begin()
and ocfs2_write_end(). This conveniently eliminates the need for
AOP_TRUNCATED_PAGE during write.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


e28911070b02362a9a3a543646da84a8fbf9f63b
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 875c114..cbec0e1 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -293,29 +293,67 @@ int ocfs2_prepare_write_nolock(struct in
 }
 
 /*
- * ocfs2_prepare_write() can be an outer-most ocfs2 call when it is called
- * from loopback.  It must be able to perform its own locking around
- * ocfs2_get_block().
+ * ocfs2_write_begin() can be an outer-most ocfs2 call when it is
+ * called from elsewhere in the kernel. It must be able to perform its
+ * own locking around ocfs2_get_block().
  */
-static int ocfs2_prepare_write(struct file *file, struct page *page,
-			       unsigned from, unsigned to)
+static int ocfs2_write_begin(struct file *file, struct address_space *mapping,
+			     loff_t pos, unsigned len, unsigned flags,
+			     struct page **pagep, void **fsdata)
 {
-	struct inode *inode = page->mapping->host;
+	struct inode *inode = mapping->host;
+	struct buffer_head *di_bh = NULL;
+	struct page *page = NULL;
 	int ret;
 
-	mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to);
-
-	ret = ocfs2_meta_lock_with_page(inode, NULL, 0, page);
+	ret = ocfs2_meta_lock(inode, &di_bh, 1);
 	if (ret != 0) {
 		mlog_errno(ret);
+		return ret;
+	}
+
+	ret = ocfs2_data_lock(inode, 1);
+	if (ret) {
+		ocfs2_meta_unlock(inode, 1);
+
+		mlog_errno(ret);
+		return ret;
+	}
+
+	/*
+	 * Lock the page out here to preserve ordering with
+	 * ip_alloc_sem.
+	 */
+	page = __grab_cache_page(mapping, pos >> PAGE_CACHE_SHIFT);
+	if (!page) {
+		ret = -ENOMEM;
+		mlog_errno(ret);
 		goto out;
 	}
 
-	ret = ocfs2_prepare_write_nolock(inode, page, from, to);
+	*pagep = page;
 
-	ocfs2_meta_unlock(inode, 0);
+	down_read(&OCFS2_I(inode)->ip_alloc_sem);
+	ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+				ocfs2_get_block);
+	up_read(&OCFS2_I(inode)->ip_alloc_sem);
 out:
-	mlog_exit(ret);
+	if (ret == 0) {
+		*fsdata = di_bh;
+	} else {
+		/*
+		 * Error return - the caller won't call
+		 * ocfs2_write_end, so drop cluster locks here.
+		 */
+		brelse(di_bh);
+		if (page) {
+			unlock_page(page);
+			page_cache_release(page);
+		}
+		ocfs2_data_unlock(inode, 1);
+		ocfs2_meta_unlock(inode, 1);
+	}
+
 	return ret;
 }
 
@@ -388,16 +426,18 @@ out:
 	return handle;
 }
 
-static int ocfs2_commit_write(struct file *file, struct page *page,
-			      unsigned from, unsigned to)
+static int ocfs2_write_end(struct file *file, struct address_space *mapping,
+			   loff_t pos, unsigned len, unsigned copied,
+			   struct page *page, void *fsdata)
 {
 	int ret;
-	struct buffer_head *di_bh = NULL;
+	unsigned from, to;
+	struct buffer_head *di_bh = fsdata;
 	struct inode *inode = page->mapping->host;
 	handle_t *handle = NULL;
 	struct ocfs2_dinode *di;
 
-	mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to);
+	mlog_entry("(0x%p, 0x%p)\n", file, page);
 
 	/* NOTE: ocfs2_file_aio_write has ensured that it's safe for
 	 * us to continue here without rechecking the I/O against
@@ -412,22 +452,13 @@ static int ocfs2_commit_write(struct fil
 	 *    stale inode allocation image (i_size, i_clusters, etc).
 	 */
 
-	ret = ocfs2_meta_lock_with_page(inode, &di_bh, 1, page);
-	if (ret != 0) {
-		mlog_errno(ret);
-		goto out;
-	}
-
-	ret = ocfs2_data_lock_with_page(inode, 1, page);
-	if (ret != 0) {
-		mlog_errno(ret);
-		goto out_unlock_meta;
-	}
+	from = pos & (PAGE_CACHE_SIZE - 1);
+	to = from + len;
 
 	handle = ocfs2_start_walk_page_trans(inode, page, from, to);
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
-		goto out_unlock_data;
+		goto out_unlock;
 	}
 
 	/* Mark our buffer early. We'd rather catch this error up here
@@ -441,8 +472,10 @@ static int ocfs2_commit_write(struct fil
 	}
 
 	/* might update i_size */
-	ret = generic_commit_write(file, page, from, to);
-	if (ret < 0) {
+	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
+	if (copied < 0) {
+		ret = copied;
+		copied = 0;
 		mlog_errno(ret);
 		goto out_commit;
 	}
@@ -458,23 +491,30 @@ static int ocfs2_commit_write(struct fil
 	di->i_size = cpu_to_le64((u64)i_size_read(inode));
 
 	ret = ocfs2_journal_dirty(handle, di_bh);
-	if (ret < 0) {
+	if (ret < 0)
 		mlog_errno(ret);
-		goto out_commit;
-	}
 
+	ret = 0;
 out_commit:
 	ocfs2_commit_trans(OCFS2_SB(inode->i_sb), handle);
-out_unlock_data:
+out_unlock:
 	ocfs2_data_unlock(inode, 1);
-out_unlock_meta:
 	ocfs2_meta_unlock(inode, 1);
-out:
+
+	if (ret) {
+		/*
+		 * We caught an error before block_write_end() -
+		 * unlock and free the page.
+		 */
+		unlock_page(page);
+		page_cache_release(page);
+	}
+
 	if (di_bh)
 		brelse(di_bh);
 
 	mlog_exit(ret);
-	return ret;
+	return copied ? copied : ret;
 }
 
 static sector_t ocfs2_bmap(struct address_space *mapping, sector_t block)
@@ -678,8 +718,8 @@ out:
 const struct address_space_operations ocfs2_aops = {
 	.readpage	= ocfs2_readpage,
 	.writepage	= ocfs2_writepage,
-	.prepare_write	= ocfs2_prepare_write,
-	.commit_write	= ocfs2_commit_write,
+	.write_begin	= ocfs2_write_begin,
+	.write_end	= ocfs2_write_end,
 	.bmap		= ocfs2_bmap,
 	.sync_page	= block_sync_page,
 	.direct_IO	= ocfs2_direct_IO,
-- 
1.3.3

-------------- next part --------------
From: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] Export __grab_cache_page

Needed at least by ocfs2 and ext[23].

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


ec4c66f0e6012a182105405aa11813fbf836629f
diff --git a/mm/filemap.c b/mm/filemap.c
index 327c20f..c4a2d68 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2196,6 +2196,7 @@ repeat:
 	}
 	return page;
 }
+EXPORT_SYMBOL(__grab_cache_page);
 
 static ssize_t generic_perform_write_2copy(struct file *file,
 				struct iov_iter *i, loff_t pos)
-- 
1.3.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: Announce: new-aops-1 for 2.6.21-rc3
  2007-03-15 23:47 ` Mark Fasheh
@ 2007-03-20  5:36   ` Nick Piggin
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-03-20  5:36 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Mar 15, 2007 at 04:47:13PM -0700, Mark Fasheh wrote:
> On Thu, Mar 15, 2007 at 05:17:04PM +0100, Nick Piggin wrote:
> > (excludes the OCFS2 patch that Mark sent, in anticipation of an update)
> 
> Attached is said patch. I needed to export __grab_cache_page (ext2/ext3 also
> need this if they're to be built as modules), so a patch to do that is also
> attached.
> 
> This passed some preliminary testing on a two node cluster I have here at
> Oracle.

Thanks Mark, I've merged these.

Nick



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-03-20  5:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-15 16:17 [Cluster-devel] Announce: new-aops-1 for 2.6.21-rc3 Nick Piggin
2007-03-15 16:17 ` Nick Piggin
2007-03-15 19:32 ` [Cluster-devel] " Joel Becker
2007-03-15 19:57   ` Nick Piggin
2007-03-15 19:53 ` Mark Fasheh
2007-03-15 19:57   ` Nick Piggin
2007-03-15 21:08 ` Mark Fasheh
2007-03-15 23:47 ` Mark Fasheh
2007-03-20  5:36   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).