linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ext3 writepages for writeback mode
@ 2005-02-11  1:31 Badari Pulavarty
  2005-02-11  1:53 ` Andrew Morton
  2005-02-14 16:50 ` Thiago Rondon
  0 siblings, 2 replies; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-11  1:31 UTC (permalink / raw)
  To: ext2-devel, linux-fsdevel; +Cc: sct, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 251 bytes --]

Hi,

Here is my first cut at adding writepages() support for
ext3 writeback mode.

I have not done any performance analysis on the patch, 
so try it at your own risk.

Please let me know, if I am completely off or its a
stupid idea.

Thanks,
Badari



[-- Attachment #2: ext3-writeback-writepages.patch --]
[-- Type: text/plain, Size: 1559 bytes --]

--- linux-2.6.10.org/fs/ext3/inode.c	2004-12-06 11:45:49.000000000 -0800
+++ linux-2.6.10/fs/ext3/inode.c	2005-02-10 18:14:17.987263744 -0800
@@ -856,6 +856,12 @@
 	return ret;
 }
 
+static int ext3_writepages_get_block(struct inode *inode, sector_t iblock,
+			struct buffer_head *bh, int create)
+{
+	return ext3_direct_io_get_blocks(inode, iblock, 1, bh, create);
+}
+
 /*
  * `handle' can be NULL if create is zero
  */
@@ -1321,6 +1327,37 @@
 	return ret;
 }
 
+static int
+ext3_writeback_writepages(struct address_space *mapping, 
+				struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	handle_t *handle = NULL;
+	int err, ret = 0;
+
+	if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
+		return ret;
+
+	handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		return ret;
+	}
+
+        ret = mpage_writepages(mapping, wbc, ext3_writepages_get_block);
+
+	/*
+	 * Need to reaquire the handle since ext3_writepages_get_block()
+	 * can restart the handle
+	 */
+	handle = journal_current_handle();
+
+	err = ext3_journal_stop(handle);
+	if (!ret)
+		ret = err;
+	return ret;
+}
+
 static int ext3_writeback_writepage(struct page *page,
 				struct writeback_control *wbc)
 {
@@ -1552,6 +1589,7 @@
 	.readpage	= ext3_readpage,
 	.readpages	= ext3_readpages,
 	.writepage	= ext3_writeback_writepage,
+	.writepages	= ext3_writeback_writepages,
 	.sync_page	= block_sync_page,
 	.prepare_write	= ext3_prepare_write,
 	.commit_write	= ext3_writeback_commit_write,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-11  1:31 [RFC] ext3 writepages for writeback mode Badari Pulavarty
@ 2005-02-11  1:53 ` Andrew Morton
  2005-02-11  3:06   ` Badari Pulavarty
  2005-02-11 23:29   ` Badari Pulavarty
  2005-02-14 16:50 ` Thiago Rondon
  1 sibling, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2005-02-11  1:53 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: ext2-devel, linux-fsdevel, sct

Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
>  Here is my first cut at adding writepages() support for
>  ext3 writeback mode.

Looks sane from a brief scan.

>  I have not done any performance analysis on the patch, 

Please do ;)

>  +static int ext3_writepages_get_block(struct inode *inode, sector_t iblock,
>  +			struct buffer_head *bh, int create)
>  +{
>  +	return ext3_direct_io_get_blocks(inode, iblock, 1, bh, create);
>  +}

yup.

>  +
>   /*
>    * `handle' can be NULL if create is zero
>    */
>  @@ -1321,6 +1327,37 @@
>   	return ret;
>   }
>   
>  +static int
>  +ext3_writeback_writepages(struct address_space *mapping, 
>  +				struct writeback_control *wbc)
>  +{
>  +	struct inode *inode = mapping->host;
>  +	handle_t *handle = NULL;
>  +	int err, ret = 0;
>  +
>  +	if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
>  +		return ret;
>  +
>  +	handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
>  +	if (IS_ERR(handle)) {
>  +		ret = PTR_ERR(handle);
>  +		return ret;
>  +	}
>  +
>  +        ret = mpage_writepages(mapping, wbc, ext3_writepages_get_block);
>  +

Funny whitespace.  What is it with you IBM guys? ;)

>  +	/*
>  +	 * Need to reaquire the handle since ext3_writepages_get_block()
>  +	 * can restart the handle
>  +	 */
>  +	handle = journal_current_handle();
>  +
>  +	err = ext3_journal_stop(handle);
>  +	if (!ret)
>  +		ret = err;
>  +	return ret;
>  +}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-11  1:53 ` Andrew Morton
@ 2005-02-11  3:06   ` Badari Pulavarty
  2005-02-11 23:29   ` Badari Pulavarty
  1 sibling, 0 replies; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-11  3:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext2-devel, linux-fsdevel, sct

Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> 
>> Here is my first cut at adding writepages() support for
>> ext3 writeback mode.
> 
> 
> Looks sane from a brief scan.

Well, not really..

mpage_writepages() could end up calling ext3_writeback_writepage()
in "confused" case thro ..

	*ret = page->mapping->a_ops->writepage(page, wbc);

which ends up doing nothing and leaves the page dirty, since there is 
journal handle started :(

	if (ext3_journal_current_handle())
		goto out_fail;

Ideas ?

Thanks,
Badari

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-11  1:53 ` Andrew Morton
  2005-02-11  3:06   ` Badari Pulavarty
@ 2005-02-11 23:29   ` Badari Pulavarty
  2005-02-11 23:58     ` Andrew Morton
  1 sibling, 1 reply; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-11 23:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext2-devel, linux-fsdevel, sct

[-- Attachment #1: Type: text/plain, Size: 232 bytes --]

Hi Andrew,

Due to lack of interesting suggestions to solve 
mpage_writepages() -> ext3_writeback_writepage() problem,
I fixed it in the dumbest possible way.

Please don't kill me :)

Let me know, what you think.

Thanks,
Badari



[-- Attachment #2: ext3-writeback-writepages.patch2 --]
[-- Type: text/x-patch, Size: 5022 bytes --]

diff -Naurp linux-2.6.10/include/linux/fs.h linux-2.6.10.new/include/linux/fs.h
--- linux-2.6.10/include/linux/fs.h	2004-12-24 13:34:27.000000000 -0800
+++ linux-2.6.10.new/include/linux/fs.h	2005-02-11 15:39:12.000000000 -0800
@@ -27,6 +27,7 @@ struct poll_table_struct;
 struct kstatfs;
 struct vm_area_struct;
 struct vfsmount;
+struct writeback_control;
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -244,6 +245,8 @@ typedef int (get_blocks_t)(struct inode 
 			struct buffer_head *bh_result, int create);
 typedef void (dio_iodone_t)(struct inode *inode, loff_t offset,
 			ssize_t bytes, void *private);
+typedef int (writepage_t)(struct page *page, struct writeback_control *wbc);
 
 /*
  * Attribute flags.  These should be or-ed together to figure out what
diff -Naurp linux-2.6.10/include/linux/mpage.h linux-2.6.10.new/include/linux/mpage.h
--- linux-2.6.10/include/linux/mpage.h	2004-12-24 13:34:32.000000000 -0800
+++ linux-2.6.10.new/include/linux/mpage.h	2005-02-11 15:40:26.000000000 -0800
@@ -17,6 +17,9 @@ int mpage_readpages(struct address_space
 int mpage_readpage(struct page *page, get_block_t get_block);
 int mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block);
+int __mpage_writepages(struct address_space *mapping,
+		struct writeback_control *wbc, get_block_t get_block,
+		writepage_t writepage);
 
 static inline int
 generic_writepages(struct address_space *mapping, struct writeback_control *wbc)
--- linux-2.6.10/fs/mpage.c	2004-12-24 13:34:26.000000000 -0800
+++ linux-2.6.10.new/fs/mpage.c	2005-02-11 16:14:39.338838584 -0800
@@ -387,7 +387,8 @@ EXPORT_SYMBOL(mpage_readpage);
  */
 static struct bio *
 mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
-	sector_t *last_block_in_bio, int *ret, struct writeback_control *wbc)
+	sector_t *last_block_in_bio, int *ret, struct writeback_control *wbc,
+	writepage_t writepage_helper)
 {
 	struct address_space *mapping = page->mapping;
 	struct inode *inode = page->mapping->host;
@@ -580,7 +581,7 @@ alloc_new:
 confused:
 	if (bio)
 		bio = mpage_bio_submit(WRITE, bio);
-	*ret = page->mapping->a_ops->writepage(page, wbc);
+	*ret = writepage_helper(page, wbc);
 	/*
 	 * The caller has a ref on the inode, so *mapping is stable
 	 */
@@ -619,6 +620,15 @@ int
 mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block)
 {
+	return __mpage_writepages(mapping, wbc, get_block,
+			 mapping->a_ops->writepage);
+}
+
+int
+__mpage_writepages(struct address_space *mapping,
+		struct writeback_control *wbc, get_block_t get_block,
+		writepage_t writepage_helper)
+{
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
@@ -707,7 +717,8 @@ retry:
 				}
 			} else {
 				bio = mpage_writepage(bio, page, get_block,
-						&last_block_in_bio, &ret, wbc);
+						&last_block_in_bio, &ret, wbc,
+						writepage_helper);
 			}
 			if (ret || (--(wbc->nr_to_write) <= 0))
 				done = 1;
@@ -735,3 +746,4 @@ retry:
 	return ret;
 }
 EXPORT_SYMBOL(mpage_writepages);
+EXPORT_SYMBOL(__mpage_writepages);
--- linux-2.6.10/fs/ext3/inode.c	2004-12-24 13:35:01.000000000 -0800
+++ linux-2.6.10.new/fs/ext3/inode.c	2005-02-11 16:16:02.566186104 -0800
@@ -856,6 +856,12 @@ get_block:
 	return ret;
 }
 
+static int ext3_writepages_get_block(struct inode *inode, sector_t iblock,
+			struct buffer_head *bh, int create)
+{
+	return ext3_direct_io_get_blocks(inode, iblock, 1, bh, create);
+}
+
 /*
  * `handle' can be NULL if create is zero
  */
@@ -1321,6 +1327,44 @@ out_fail:
 	return ret;
 }
 
+static int ext3_writeback_writepage_helper(struct page *page,
+				struct writeback_control *wbc)
+{
+	return block_write_full_page(page, ext3_get_block, wbc);
+}
+
+static int
+ext3_writeback_writepages(struct address_space *mapping, 
+				struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	handle_t *handle = NULL;
+	int err, ret = 0;
+
+	if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
+		return ret;
+
+	handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		return ret;
+	}
+
+	ret = __mpage_writepages(mapping, wbc, ext3_writepages_get_block,
+					ext3_writeback_writepage_helper);
+
+	/*
+	 * Need to reaquire the handle since ext3_writepages_get_block()
+	 * can restart the handle
+	 */
+	handle = journal_current_handle();
+
+	err = ext3_journal_stop(handle);
+	if (!ret)
+		ret = err;
+	return ret;
+}
+
 static int ext3_writeback_writepage(struct page *page,
 				struct writeback_control *wbc)
 {
@@ -1552,6 +1596,7 @@ static struct address_space_operations e
 	.readpage	= ext3_readpage,
 	.readpages	= ext3_readpages,
 	.writepage	= ext3_writeback_writepage,
+	.writepages	= ext3_writeback_writepages,
 	.sync_page	= block_sync_page,
 	.prepare_write	= ext3_prepare_write,
 	.commit_write	= ext3_writeback_commit_write,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-11 23:29   ` Badari Pulavarty
@ 2005-02-11 23:58     ` Andrew Morton
  2005-02-12  0:09       ` Badari Pulavarty
  2005-02-12  0:51       ` Badari Pulavarty
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2005-02-11 23:58 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: ext2-devel, linux-fsdevel, sct

Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> Due to lack of interesting suggestions to solve 
> mpage_writepages() -> ext3_writeback_writepage() problem,
> I fixed it in the dumbest possible way.

I've actually forgotten what the problem was.  It was 100 patches ago :(

> Let me know, what you think.


If it works, let's get some benchmark numbers so we can decide whether it
justifies more development?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-11 23:58     ` Andrew Morton
@ 2005-02-12  0:09       ` Badari Pulavarty
  2005-02-14 20:02         ` Sonny Rao
  2005-02-12  0:51       ` Badari Pulavarty
  1 sibling, 1 reply; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-12  0:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext2-devel, linux-fsdevel, sct

On Fri, 2005-02-11 at 15:58, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > Due to lack of interesting suggestions to solve 
> > mpage_writepages() -> ext3_writeback_writepage() problem,
> > I fixed it in the dumbest possible way.
> 
> I've actually forgotten what the problem was.  It was 100 patches ago :(

The problem was 
	ext3_writeback_writepages() -> mpage_writepages() could
call back ext3_writeback_writepage() in the "confused" case.
ext3_writeback_writepage() could end up doing nothing, since the
we already have a journal handle.

I added a writepage_helper to handle this case.

> 
> > Let me know, what you think.
> 
> 
> If it works, let's get some benchmark numbers so we can decide whether it
> justifies more development?
> 

Yep. I will get some numbers to see ..

Thanks,
Badari


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-11 23:58     ` Andrew Morton
  2005-02-12  0:09       ` Badari Pulavarty
@ 2005-02-12  0:51       ` Badari Pulavarty
  2005-02-12  1:00         ` Andrew Morton
  2005-02-12 12:20         ` [Ext2-devel] " Alex Tomas
  1 sibling, 2 replies; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-12  0:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext2-devel, linux-fsdevel, sct

On Fri, 2005-02-11 at 15:58, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > Due to lack of interesting suggestions to solve 
> > mpage_writepages() -> ext3_writeback_writepage() problem,
> > I fixed it in the dumbest possible way.
> 
> I've actually forgotten what the problem was.  It was 100 patches ago :(
> 
> > Let me know, what you think.
> 
> 
> If it works, let's get some benchmark numbers so we can decide whether it
> justifies more development?

Okay, here is a quick data I collected. More to follow..

Test: writes 10,000 blocks of 64k and does fdatasync().

Thanks,
Badari

BEFORE: (without writepages support)
                                                                                                       
elm3b29:/mnt # touch file
elm3b29:/mnt # time /tmp/writer file
real    0m23.746s user    0m0.000s sys     0m5.020s (allocation)
elm3b29:/mnt # time /tmp/writer file
real    0m20.950s user    0m0.001s sys     0m2.278s (no alloc)
elm3b29:/mnt # time /tmp/writer file
real    0m21.030s user    0m0.001s sys     0m2.254s (no alloc)
elm3b29:/mnt # time /tmp/writer file
real    0m20.577s user    0m0.001s sys     0m2.184s (no alloc)
                                                                                                       
====
AFTER: (with writepages support)
                                                                                                       
elm3b29:/mnt # touch file
elm3b29:/mnt # time /tmp/writer file
real    0m23.230s user    0m0.001s sys     0m4.132s (allocation)
elm3b29:/mnt # time /tmp/writer file
real    0m20.175s user    0m0.004s sys     0m1.756s (no alloc)
elm3b29:/mnt # time /tmp/writer file
real    0m20.368s user    0m0.001s sys     0m1.696s (no alloc)
elm3b29:/mnt # time /tmp/writer file
real    0m20.626s user    0m0.002s sys     0m1.763s (no alloc)




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-12  0:51       ` Badari Pulavarty
@ 2005-02-12  1:00         ` Andrew Morton
  2005-02-12  1:55           ` Badari Pulavarty
  2005-02-12 12:20         ` [Ext2-devel] " Alex Tomas
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2005-02-12  1:00 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: ext2-devel, linux-fsdevel, sct

Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> BEFORE: (without writepages support)
>                                                                                                        
> elm3b29:/mnt # touch file
> elm3b29:/mnt # time /tmp/writer file
> real    0m23.746s user    0m0.000s sys     0m5.020s (allocation)
> elm3b29:/mnt # time /tmp/writer file
> real    0m20.950s user    0m0.001s sys     0m2.278s (no alloc)
> elm3b29:/mnt # time /tmp/writer file
> real    0m21.030s user    0m0.001s sys     0m2.254s (no alloc)
> elm3b29:/mnt # time /tmp/writer file
> real    0m20.577s user    0m0.001s sys     0m2.184s (no alloc)
>                                                                                                        
> ====
> AFTER: (with writepages support)
>                                                                                                        
> elm3b29:/mnt # touch file
> elm3b29:/mnt # time /tmp/writer file
> real    0m23.230s user    0m0.001s sys     0m4.132s (allocation)
> elm3b29:/mnt # time /tmp/writer file
> real    0m20.175s user    0m0.004s sys     0m1.756s (no alloc)
> elm3b29:/mnt # time /tmp/writer file
> real    0m20.368s user    0m0.001s sys     0m1.696s (no alloc)
> elm3b29:/mnt # time /tmp/writer file
> real    0m20.626s user    0m0.002s sys     0m1.763s (no alloc)

Holy cow.  I'm shocked.

There's no system CPU time involved, and the user CPU time didn't change. 
We must be getting better I/O scheduling for some reason.  I wonder what it
is?

That, or we're forgetting to write something ;)

What journalling mode were you using?
What I/O scheduler?
What sort of disk system?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-12  1:00         ` Andrew Morton
@ 2005-02-12  1:55           ` Badari Pulavarty
  0 siblings, 0 replies; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-12  1:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ext2-devel, linux-fsdevel, sct

On Fri, 2005-02-11 at 17:00, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > BEFORE: (without writepages support)
> >                                                                                                        
> > elm3b29:/mnt # touch file
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m23.746s user    0m0.000s sys     0m5.020s (allocation)
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m20.950s user    0m0.001s sys     0m2.278s (no alloc)
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m21.030s user    0m0.001s sys     0m2.254s (no alloc)
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m20.577s user    0m0.001s sys     0m2.184s (no alloc)
> >                                                                                                        
> > ====
> > AFTER: (with writepages support)
> >                                                                                                        
> > elm3b29:/mnt # touch file
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m23.230s user    0m0.001s sys     0m4.132s (allocation)
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m20.175s user    0m0.004s sys     0m1.756s (no alloc)
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m20.368s user    0m0.001s sys     0m1.696s (no alloc)
> > elm3b29:/mnt # time /tmp/writer file
> > real    0m20.626s user    0m0.002s sys     0m1.763s (no alloc)
> 
> Holy cow.  I'm shocked.
> 
> There's no system CPU time involved, and the user CPU time didn't change. 
> We must be getting better I/O scheduling for some reason.  I wonder what it
> is?
> 
> That, or we're forgetting to write something ;)

I hope not.

> 
> What journalling mode were you using?

writeback mode. I hacked only writeback mode for writepages().

> What I/O scheduler?

anticipatory (2.6.10 default)


> What sort of disk system?

Simple JBOD, single disk.

Thanks,
Badari


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [RFC] ext3 writepages for writeback mode
  2005-02-12  0:51       ` Badari Pulavarty
  2005-02-12  1:00         ` Andrew Morton
@ 2005-02-12 12:20         ` Alex Tomas
  2005-02-12 18:47           ` Badari Pulavarty
  2005-02-12 23:26           ` Badari Pulavarty
  1 sibling, 2 replies; 20+ messages in thread
From: Alex Tomas @ 2005-02-12 12:20 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Andrew Morton, ext2-devel, linux-fsdevel, sct


>>>>> Badari Pulavarty (BP) writes:

 BP> Test: writes 10,000 blocks of 64k and does fdatasync().

The patch doesn't apply ;) after a minor correction
I've tested the patch too:

SMP, before:
[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m17.748s user    0m0.032s sys     0m6.031s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m16.016s user    0m0.015s sys     0m3.819s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m19.074s user    0m0.014s sys     0m3.761s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m22.011s user    0m0.010s sys     0m3.773s


SMP, after:
[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m20.676s user    0m0.023s sys     0m5.643s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m13.188s user    0m0.016s sys     0m3.511s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m23.725s user    0m0.017s sys     0m3.399s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m23.684s user    0m0.013s sys     0m3.421s 



UP, after:
[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m21.452s user    0m0.026s sys     0m6.105s

[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m23.317s user    0m0.027s sys     0m4.206s

[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m25.190s user    0m0.022s sys     0m4.026s

[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m25.024s user    0m0.021s sys     0m4.103s


UP, before:
[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m19.138s user    0m0.021s sys     0m5.980s

[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m23.123s user    0m0.026s sys     0m4.083s

[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m25.152s user    0m0.017s sys     0m3.836s

[root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
real    0m25.217s user    0m0.026s sys     0m3.827s




AFAICS, the patch makes sense, but in my setup it doesn't
reduce sys.time so dramatically. all my boxes are P3-based.

thanks, Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [RFC] ext3 writepages for writeback mode
  2005-02-12 12:20         ` [Ext2-devel] " Alex Tomas
@ 2005-02-12 18:47           ` Badari Pulavarty
  2005-02-12 21:43             ` Alex Tomas
  2005-02-12 23:26           ` Badari Pulavarty
  1 sibling, 1 reply; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-12 18:47 UTC (permalink / raw)
  To: Alex Tomas; +Cc: Andrew Morton, ext2-devel, linux-fsdevel, sct

Alex Tomas wrote:
>>>>>>Badari Pulavarty (BP) writes:
> 
> 
>  BP> Test: writes 10,000 blocks of 64k and does fdatasync().
> 
> The patch doesn't apply ;) after a minor correction
> I've tested the patch too:
> 
> SMP, before:
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m17.748s user    0m0.032s sys     0m6.031s
> 
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m16.016s user    0m0.015s sys     0m3.819s
> 
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m19.074s user    0m0.014s sys     0m3.761s
> 
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m22.011s user    0m0.010s sys     0m3.773s
> 
> 
> SMP, after:
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m20.676s user    0m0.023s sys     0m5.643s
> 
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m13.188s user    0m0.016s sys     0m3.511s
> 
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m23.725s user    0m0.017s sys     0m3.399s
> 
> [root@bob root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m23.684s user    0m0.013s sys     0m3.421s 
> 
> 
> 
> UP, after:
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m21.452s user    0m0.026s sys     0m6.105s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m23.317s user    0m0.027s sys     0m4.206s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.190s user    0m0.022s sys     0m4.026s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.024s user    0m0.021s sys     0m4.103s
> 
> 
> UP, before:
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m19.138s user    0m0.021s sys     0m5.980s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m23.123s user    0m0.026s sys     0m4.083s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.152s user    0m0.017s sys     0m3.836s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.217s user    0m0.026s sys     0m3.827s
> 
> 
> 
> 
> AFAICS, the patch makes sense, but in my setup it doesn't
> reduce sys.time so dramatically. all my boxes are P3-based.
> 
> thanks, Alex
> 
> 

Thank you for testing and confirming the results.
Is this on a simple single disk configuration ?

Thanks,
Badari

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [RFC] ext3 writepages for writeback mode
  2005-02-12 18:47           ` Badari Pulavarty
@ 2005-02-12 21:43             ` Alex Tomas
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Tomas @ 2005-02-12 21:43 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Alex Tomas, Andrew Morton, ext2-devel, linux-fsdevel, sct

>>>>> Badari Pulavarty (BP) writes:


 BP> Thank you for testing and confirming the results.
 BP> Is this on a simple single disk configuration ?

nope, the box boxes connect to 2disks raid0 via FC1.
but the disks are damn old ...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [RFC] ext3 writepages for writeback mode
  2005-02-12 12:20         ` [Ext2-devel] " Alex Tomas
  2005-02-12 18:47           ` Badari Pulavarty
@ 2005-02-12 23:26           ` Badari Pulavarty
  2005-02-12 23:29             ` Alex Tomas
  1 sibling, 1 reply; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-12 23:26 UTC (permalink / raw)
  To: Alex Tomas; +Cc: Andrew Morton, ext2-devel, linux-fsdevel, sct

Alex Tomas wrote:

>>>>>>Badari Pulavarty (BP) writes:
> 
> 
>  BP> Test: writes 10,000 blocks of 64k and does fdatasync().
> 
> The patch doesn't apply ;) after a minor correction
> I've tested the patch too:
> 
...
> UP, after:
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m21.452s user    0m0.026s sys     0m6.105s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m23.317s user    0m0.027s sys     0m4.206s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.190s user    0m0.022s sys     0m4.026s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.024s user    0m0.021s sys     0m4.103s
> 
> 
> UP, before:
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m19.138s user    0m0.021s sys     0m5.980s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m23.123s user    0m0.026s sys     0m4.083s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.152s user    0m0.017s sys     0m3.836s
> 
> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
> real    0m25.217s user    0m0.026s sys     0m3.827s

Hmm.. On UP, the patch made sys.time little worse ?

- Badari

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [RFC] ext3 writepages for writeback mode
  2005-02-12 23:26           ` Badari Pulavarty
@ 2005-02-12 23:29             ` Alex Tomas
  2005-02-14 15:58               ` Badari Pulavarty
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Tomas @ 2005-02-12 23:29 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Alex Tomas, Andrew Morton, ext2-devel, linux-fsdevel, sct

>>>>> Badari Pulavarty (BP) writes:

 >> UP, after:
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m21.452s user    0m0.026s sys     0m6.105s
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m23.317s user    0m0.027s sys     0m4.206s
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m25.190s user    0m0.022s sys     0m4.026s
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m25.024s user    0m0.021s sys     0m4.103s
 >> UP, before:
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m19.138s user    0m0.021s sys     0m5.980s
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m23.123s user    0m0.026s sys     0m4.083s
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m25.152s user    0m0.017s sys     0m3.836s
 >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
 >> real    0m25.217s user    0m0.026s sys     0m3.827s

 BP> Hmm.. On UP, the patch made sys.time little worse ?

grrrr. sorry, that's a typo. exchange 'after' and 'before' please.
btw, want to see how delayed allocation manages this?

thanks, Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [RFC] ext3 writepages for writeback mode
  2005-02-12 23:29             ` Alex Tomas
@ 2005-02-14 15:58               ` Badari Pulavarty
  2005-02-14 16:34                 ` Alex Tomas
  0 siblings, 1 reply; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-14 15:58 UTC (permalink / raw)
  To: Alex Tomas; +Cc: Andrew Morton, ext2-devel, linux-fsdevel, sct

On Sat, 2005-02-12 at 15:29, Alex Tomas wrote:
> >>>>> Badari Pulavarty (BP) writes:
> 
>  >> UP, after:
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m21.452s user    0m0.026s sys     0m6.105s
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m23.317s user    0m0.027s sys     0m4.206s
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m25.190s user    0m0.022s sys     0m4.026s
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m25.024s user    0m0.021s sys     0m4.103s
>  >> UP, before:
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m19.138s user    0m0.021s sys     0m5.980s
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m23.123s user    0m0.026s sys     0m4.083s
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m25.152s user    0m0.017s sys     0m3.836s
>  >> [root@zefir root]# time /work/tests/fwrite /test/fff 64 10000
>  >> real    0m25.217s user    0m0.026s sys     0m3.827s
> 
>  BP> Hmm.. On UP, the patch made sys.time little worse ?
> 
> grrrr. sorry, that's a typo. exchange 'after' and 'before' please.

Thank you. I was worried for a minute.

> btw, want to see how delayed allocation manages this?

Sure. I think it will improve the allocation case. 
Non-allocation case, should be pretty much same, provided
I got contiguous layout on the disk.

Isn't it ?

Thanks,
Badari


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [RFC] ext3 writepages for writeback mode
  2005-02-14 15:58               ` Badari Pulavarty
@ 2005-02-14 16:34                 ` Alex Tomas
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Tomas @ 2005-02-14 16:34 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Alex Tomas, Andrew Morton, ext2-devel, linux-fsdevel, sct

>>>>> Badari Pulavarty (BP) writes:
 BP> Sure. I think it will improve the allocation case. 
 BP> Non-allocation case, should be pretty much same, provided
 BP> I got contiguous layout on the disk. Isn't it ?


not allocation only:

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m13.102s user    0m0.027s sys     0m4.003s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m22.210s user    0m0.006s sys     0m3.241s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m23.242s user    0m0.005s sys     0m3.202s

[root@bob root]# time /work/tests/fwrite /test/fff 64 10000
real    0m24.777s user    0m0.011s sys     0m3.175s

I think at least couple things're involved:
 - for already allocated block I don't open a transaction
 - extents code caches last found extent, so most lookups
   just do few compares with in-inode integers and don't
   walk though a tree

btw, I've observed the following issue: sometimes vm can
run several ->writepages() against a same file or mix
->writepages() with ->writepage(). I think this can break
layout ...

thanks, Alex



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-11  1:31 [RFC] ext3 writepages for writeback mode Badari Pulavarty
  2005-02-11  1:53 ` Andrew Morton
@ 2005-02-14 16:50 ` Thiago Rondon
  2005-02-14 18:08   ` Badari Pulavarty
  1 sibling, 1 reply; 20+ messages in thread
From: Thiago Rondon @ 2005-02-14 16:50 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: ext2-devel, linux-fsdevel, sct, Andrew Morton

Can you explain more what do you wrote ?

Thanks in advanced!
-Thiago Rondon


On Thu, 10 Feb 2005 17:48:55 -0800 (PST), Badari Pulavarty
<pbadari@us.ibm.com> wrote:
> Hi,
> 
> Here is my first cut at adding writepages() support for
> ext3 writeback mode.
> 
> I have not done any performance analysis on the patch,
> so try it at your own risk.
> 
> Please let me know, if I am completely off or its a
> stupid idea.
> 
> Thanks,
> Badari
> 
> 
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] ext3 writepages for writeback mode
  2005-02-14 16:50 ` Thiago Rondon
@ 2005-02-14 18:08   ` Badari Pulavarty
  0 siblings, 0 replies; 20+ messages in thread
From: Badari Pulavarty @ 2005-02-14 18:08 UTC (permalink / raw)
  To: Thiago Rondon; +Cc: ext2-devel, linux-fsdevel, sct, Andrew Morton

Okay, here is the background..

writepages() interfaces gets used to flush the 
entire file. ext3 doesn't have writepages() interface.
Without the interface, the generic writepage interface
calls writepage() in a loop. Since we are operating
on one page at a time, we end up submiting one block
of IO at a time. Hopefully, IO schedulers do all the
merging needed to submit a large IO.

My patch provides, writepages() interface for ext3
writeback mode (ordered mode is little complicated).
writepages() could batch up larger IO chunks (since
it operates on multiple pages/blocks). In this case,
IO scheduler does less work.


Thanks,
Badari


On Mon, 2005-02-14 at 08:50, Thiago Rondon wrote:
> Can you explain more what do you wrote ?
> 
> Thanks in advanced!
> -Thiago Rondon
> 
> 
> On Thu, 10 Feb 2005 17:48:55 -0800 (PST), Badari Pulavarty
> <pbadari@us.ibm.com> wrote:
> > Hi,
> > 
> > Here is my first cut at adding writepages() support for
> > ext3 writeback mode.
> > 
> > I have not done any performance analysis on the patch,
> > so try it at your own risk.
> > 
> > Please let me know, if I am completely off or its a
> > stupid idea.
> > 
> > Thanks,
> > Badari
> > 
> > 
> >
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [RFC] ext3 writepages for writeback mode
  2005-02-12  0:09       ` Badari Pulavarty
@ 2005-02-14 20:02         ` Sonny Rao
  2005-02-14 20:22           ` [Ext2-devel] " Sonny Rao
  0 siblings, 1 reply; 20+ messages in thread
From: Sonny Rao @ 2005-02-14 20:02 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Andrew Morton, ext2-devel, linux-fsdevel, sct

On Fri, Feb 11, 2005 at 04:09:46PM -0800, Badari Pulavarty wrote:
> On Fri, 2005-02-11 at 15:58, Andrew Morton wrote:
> > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > >
> > > Due to lack of interesting suggestions to solve 
> > > mpage_writepages() -> ext3_writeback_writepage() problem,
> > > I fixed it in the dumbest possible way.
> > 
> > I've actually forgotten what the problem was.  It was 100 patches ago :(
> 
> The problem was 
> 	ext3_writeback_writepages() -> mpage_writepages() could
> call back ext3_writeback_writepage() in the "confused" case.
> ext3_writeback_writepage() could end up doing nothing, since the
> we already have a journal handle.
> 
> I added a writepage_helper to handle this case.
> 
> > 
> > > Let me know, what you think.
> > 
> > 
> > If it works, let's get some benchmark numbers so we can decide whether it
> > justifies more development?
> > 
> 
> Yep. I will get some numbers to see ..
> 

I'm helping Badari collecting data on this...

My setup is a P4 2.0Ghz booted with 1GB of RAM and 1 cpu attached via
Fiber to a seven disk raid0 array with write-caching turned off
(write-cacheing can skew numbers significantly if you aren't careful
to let the cache drain between runs, etc.) 

The test is a single-threaded sequential overwrite of a 20GB data set
divided into 512MB files which are selected randomly and overwritten.

All of these numbers represent the average of three five-minute runs.

All ext3 tests are in writeback mode

FS		Throughput	Cpu Utilizaiton
--		----------	---------------
Ext3		78 MB/sec	75.9 %
Ext3 + wpages	85 MB/sec	74.7 %

Just for comparison:

Ext2		88.5 MB/sec	74.2 %
Ext2 + nobh	89.6 MB/sec	71.7 %

JFS		94.8 MB/sec	85.6 %
XFS		100 MB/sec	95.5 %


So, Badari's writepages patch improves performance on this particular
setup by almost 10 %


I can rerun with more processors/ram, or different disk configurations
if anyone is interested.

Sonny


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ext2-devel] Re: [RFC] ext3 writepages for writeback mode
  2005-02-14 20:02         ` Sonny Rao
@ 2005-02-14 20:22           ` Sonny Rao
  0 siblings, 0 replies; 20+ messages in thread
From: Sonny Rao @ 2005-02-14 20:22 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Andrew Morton, ext2-devel, linux-fsdevel, sct

[-- Attachment #1: Type: text/plain, Size: 2239 bytes --]

On Mon, Feb 14, 2005 at 03:02:56PM -0500, Sonny Rao wrote:
> On Fri, Feb 11, 2005 at 04:09:46PM -0800, Badari Pulavarty wrote:
> > On Fri, 2005-02-11 at 15:58, Andrew Morton wrote:
> > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > >
> > > > Due to lack of interesting suggestions to solve 
> > > > mpage_writepages() -> ext3_writeback_writepage() problem,
> > > > I fixed it in the dumbest possible way.
> > > 
> > > I've actually forgotten what the problem was.  It was 100 patches ago :(
> > 
> > The problem was 
> > 	ext3_writeback_writepages() -> mpage_writepages() could
> > call back ext3_writeback_writepage() in the "confused" case.
> > ext3_writeback_writepage() could end up doing nothing, since the
> > we already have a journal handle.
> > 
> > I added a writepage_helper to handle this case.
> > 
> > > 
> > > > Let me know, what you think.
> > > 
> > > 
> > > If it works, let's get some benchmark numbers so we can decide whether it
> > > justifies more development?
> > > 
> > 
> > Yep. I will get some numbers to see ..
> > 
> 
> I'm helping Badari collecting data on this...
> 
> My setup is a P4 2.0Ghz booted with 1GB of RAM and 1 cpu attached via
> Fiber to a seven disk raid0 array with write-caching turned off
> (write-cacheing can skew numbers significantly if you aren't careful
> to let the cache drain between runs, etc.) 
> 
> The test is a single-threaded sequential overwrite of a 20GB data set
> divided into 512MB files which are selected randomly and overwritten.
> 
> All of these numbers represent the average of three five-minute runs.
> 
> All ext3 tests are in writeback mode
> 
> FS		Throughput	Cpu Utilizaiton
> --		----------	---------------
> Ext3		78 MB/sec	75.9 %
> Ext3 + wpages	85 MB/sec	74.7 %
> 
> Just for comparison:
> 
> Ext2		88.5 MB/sec	74.2 %
> Ext2 + nobh	89.6 MB/sec	71.7 %
> 
> JFS		94.8 MB/sec	85.6 %
> XFS		100 MB/sec	95.5 %
> 
> 
> So, Badari's writepages patch improves performance on this particular
> setup by almost 10 %
> 
> 
> I can rerun with more processors/ram, or different disk configurations
> if anyone is interested.
> 
> Sonny

One other detail,

I had to fix the patch, for some reason it was malformed ?

I've attached my version of the patch.

Sonny



[-- Attachment #2: ext3-writeback-writepages.patch2-fixed --]
[-- Type: text/plain, Size: 5168 bytes --]

diff -Naurp linux-2.6.10-orig/fs/ext3/inode.c linux-2.6.10/fs/ext3/inode.c
--- linux-2.6.10-orig/fs/ext3/inode.c	2005-01-21 00:51:27.000000000 -0600
+++ linux-2.6.10/fs/ext3/inode.c	2005-02-14 11:36:52.387332295 -0600
@@ -856,6 +856,12 @@ get_block:
 	return ret;
 }
 
+static int ext3_writepages_get_block(struct inode *inode, sector_t iblock,
+			struct buffer_head *bh, int create)
+{
+	return ext3_direct_io_get_blocks(inode, iblock, 1, bh, create);
+}
+
 /*
  * `handle' can be NULL if create is zero
  */
@@ -1321,6 +1327,44 @@ out_fail:
 	return ret;
 }
 
+static int ext3_writeback_writepage_helper(struct page *page,
+				struct writeback_control *wbc)
+{
+	return block_write_full_page(page, ext3_get_block, wbc);
+}
+
+static int
+ext3_writeback_writepages(struct address_space *mapping, 
+				struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	handle_t *handle = NULL;
+	int err, ret = 0;
+
+	if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
+		return ret;
+
+	handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		return ret;
+	}
+
+	ret = __mpage_writepages(mapping, wbc, ext3_writepages_get_block,
+					ext3_writeback_writepage_helper);
+
+	/*
+	 * Need to reaquire the handle since ext3_writepages_get_block()
+	 * can restart the handle
+	 */
+	handle = journal_current_handle();
+
+	err = ext3_journal_stop(handle);
+	if (!ret)
+		ret = err;
+	return ret;
+}
+
 static int ext3_writeback_writepage(struct page *page,
 				struct writeback_control *wbc)
 {
@@ -1552,6 +1596,7 @@ static struct address_space_operations e
 	.readpage	= ext3_readpage,
 	.readpages	= ext3_readpages,
 	.writepage	= ext3_writeback_writepage,
+	.writepages	= ext3_writeback_writepages,
 	.sync_page	= block_sync_page,
 	.prepare_write	= ext3_prepare_write,
 	.commit_write	= ext3_writeback_commit_write,
diff -Naurp linux-2.6.10-orig/fs/mpage.c linux-2.6.10/fs/mpage.c
--- linux-2.6.10-orig/fs/mpage.c	2004-10-18 16:53:43.000000000 -0500
+++ linux-2.6.10/fs/mpage.c	2005-02-14 11:36:52.383332918 -0600
@@ -387,7 +387,8 @@ EXPORT_SYMBOL(mpage_readpage);
  */
 static struct bio *
 mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
-	sector_t *last_block_in_bio, int *ret, struct writeback_control *wbc)
+	sector_t *last_block_in_bio, int *ret, struct writeback_control *wbc,
+	writepage_t writepage_helper)
 {
 	struct address_space *mapping = page->mapping;
 	struct inode *inode = page->mapping->host;
@@ -580,7 +581,7 @@ alloc_new:
 confused:
 	if (bio)
 		bio = mpage_bio_submit(WRITE, bio);
-	*ret = page->mapping->a_ops->writepage(page, wbc);
+	*ret = writepage_helper(page, wbc);
 	/*
 	 * The caller has a ref on the inode, so *mapping is stable
 	 */
@@ -619,6 +620,15 @@ int
 mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block)
 {
+	return __mpage_writepages(mapping, wbc, get_block,
+			 mapping->a_ops->writepage);
+}
+
+int
+__mpage_writepages(struct address_space *mapping,
+		struct writeback_control *wbc, get_block_t get_block,
+		writepage_t writepage_helper)
+{
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
@@ -707,7 +717,8 @@ retry:
 				}
 			} else {
 				bio = mpage_writepage(bio, page, get_block,
-						&last_block_in_bio, &ret, wbc);
+						&last_block_in_bio, &ret, wbc,
+						writepage_helper);
 			}
 			if (ret || (--(wbc->nr_to_write) <= 0))
 				done = 1;
@@ -735,3 +746,4 @@ retry:
 	return ret;
 }
 EXPORT_SYMBOL(mpage_writepages);
+EXPORT_SYMBOL(__mpage_writepages);
diff -Naurp linux-2.6.10-orig/include/linux/fs.h linux-2.6.10/include/linux/fs.h
--- linux-2.6.10-orig/include/linux/fs.h	2005-01-21 00:51:40.000000000 -0600
+++ linux-2.6.10/include/linux/fs.h	2005-02-14 11:36:19.214499355 -0600
@@ -27,6 +27,7 @@ struct poll_table_struct;
 struct kstatfs;
 struct vm_area_struct;
 struct vfsmount;
+struct writeback_control;
 
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -244,6 +245,7 @@ typedef int (get_blocks_t)(struct inode 
 			struct buffer_head *bh_result, int create);
 typedef void (dio_iodone_t)(struct inode *inode, loff_t offset,
 			ssize_t bytes, void *private);
+typedef int (writepage_t)(struct page *page, struct writeback_control *wbc);
 
 /*
  * Attribute flags.  These should be or-ed together to figure out what
diff -Naurp linux-2.6.10-orig/include/linux/mpage.h linux-2.6.10/include/linux/mpage.h
--- linux-2.6.10-orig/include/linux/mpage.h	2004-10-18 16:53:51.000000000 -0500
+++ linux-2.6.10/include/linux/mpage.h	2005-02-14 11:36:52.381333229 -0600
@@ -17,6 +17,9 @@ int mpage_readpages(struct address_space
 int mpage_readpage(struct page *page, get_block_t get_block);
 int mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block);
+int __mpage_writepages(struct address_space *mapping,
+		struct writeback_control *wbc, get_block_t get_block,
+		writepage_t writepage);
 
 static inline int
 generic_writepages(struct address_space *mapping, struct writeback_control *wbc)

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2005-02-14 20:35 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-11  1:31 [RFC] ext3 writepages for writeback mode Badari Pulavarty
2005-02-11  1:53 ` Andrew Morton
2005-02-11  3:06   ` Badari Pulavarty
2005-02-11 23:29   ` Badari Pulavarty
2005-02-11 23:58     ` Andrew Morton
2005-02-12  0:09       ` Badari Pulavarty
2005-02-14 20:02         ` Sonny Rao
2005-02-14 20:22           ` [Ext2-devel] " Sonny Rao
2005-02-12  0:51       ` Badari Pulavarty
2005-02-12  1:00         ` Andrew Morton
2005-02-12  1:55           ` Badari Pulavarty
2005-02-12 12:20         ` [Ext2-devel] " Alex Tomas
2005-02-12 18:47           ` Badari Pulavarty
2005-02-12 21:43             ` Alex Tomas
2005-02-12 23:26           ` Badari Pulavarty
2005-02-12 23:29             ` Alex Tomas
2005-02-14 15:58               ` Badari Pulavarty
2005-02-14 16:34                 ` Alex Tomas
2005-02-14 16:50 ` Thiago Rondon
2005-02-14 18:08   ` Badari Pulavarty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).