Re: [PATCH updated] ext4: Fix file fragmentation during large file write.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: cmm@us.ibm.com, tytso@mit.edu, sandeen@redhat.com,
	akpm@linux-foundation.org, hch@infradead.org, steve@chygwyn.com,
	npiggin@suse.de, mpatocka@redhat.com, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH updated] ext4: Fix file fragmentation during large file write.
Date: Thu, 16 Oct 2008 13:20:54 +0530	[thread overview]
Message-ID: <20081016075054.GC19480@skywalker> (raw)
In-Reply-To: <1224114692.6938.48.camel@think.oraclecorp.com>

On Wed, Oct 15, 2008 at 07:51:32PM -0400, Chris Mason wrote:
> On Wed, 2008-10-15 at 16:41 -0400, Chris Mason wrote:
> > On Fri, 2008-10-10 at 23:32 +0530, Aneesh Kumar K.V wrote:
> > > The range_cyclic writeback mode use the address_space
> > > writeback_index as the start index for writeback. With
> > > delayed allocation we were updating writeback_index
> > > wrongly resulting in highly fragmented file. Number of
> > > extents reduced from 4000 to 27 for a 3GB file with
> > > the below patch.
> > > 
> > 
> > I tested the ext4 patch queue from today on top of 2.6.27, and this
> > includes Aneesh's latest patches.
> > 
> > Things are going at disk speed for streaming writes, with the number of
> > extents generated for a 32GB file down to 27.  So, this is definitely an
> > improvement for ext4.
> 
> Just FYI, I ran this with compilebench -i 20 --makej and my log is full
> of these:
> 
> ext4_da_writepages: jbd2_start: 1024 pages, ino 520417; err -30
> Pid: 4072, comm: pdflush Not tainted 2.6.27 #2
> 
> Call Trace:
>  [<ffffffffa0048493>] ext4_da_writepages+0x171/0x2d3 [ext4]
>  [<ffffffff802336be>] ? pick_next_task_fair+0x80/0x91
>  [<ffffffff80228fa8>] ? source_load+0x2a/0x58
>  [<ffffffff8038e499>] ? __next_cpu+0x19/0x26
>  [<ffffffff8026748f>] do_writepages+0x28/0x37
>  [<ffffffff802a6b39>] __writeback_single_inode+0x14f/0x26d
>  [<ffffffff802a6fb7>] generic_sync_sb_inodes+0x1c1/0x2a2
>  [<ffffffff802a70a1>] sync_sb_inodes+0x9/0xb
>  [<ffffffff802a73dc>] writeback_inodes+0x64/0xad
>  [<ffffffff802675db>] wb_kupdate+0x9a/0x10c
>  [<ffffffff80267fd1>] ? pdflush+0x0/0x1e9
>  [<ffffffff80267fd1>] ? pdflush+0x0/0x1e9
>  [<ffffffff8026810e>] pdflush+0x13d/0x1e9
>  [<ffffffff80267541>] ? wb_kupdate+0x0/0x10c
>  [<ffffffff80248222>] kthread+0x49/0x77
>  [<ffffffff8020c5e9>] child_rip+0xa/0x11
>  [<ffffffff802481d9>] ? kthread+0x0/0x77
>  [<ffffffff8020c5df>] ? child_rip+0x0/0x11

I actually did the mount -o remount,ro test before sending out the
patches to see if we are skipping some pages during writeback. I
didn't see the error at that time. Today i tried again on a larger
file system and i am able to reproduce the above stack with -o
remount,ro. The patch below fix it for me. The VFS writeback looks
at pages_skipped and make some decision if the value  increase
during call back(__fynsc_super -> generic_sync_sb_inodes).
So we need to update pages_skipped also. This may not apply on
top of what patches are in patchqueue. I also have other changes
to use single variable no_nrwite_index_update as per Christoph
suggestion. I will send out the patches for patchqueue after 
I look at the compile bench numbers you reported.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c4b9b4..88ce29e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2370,10 +2370,10 @@ static int ext4_da_writepages(struct address_space *mapping,
 	pgoff_t	index;
 	int range_whole = 0;
 	handle_t *handle = NULL;
-	long pages_written = 0;
 	struct mpage_da_data mpd;
 	struct inode *inode = mapping->host;
 	int no_nrwrite_index_update;
+	long pages_written = 0, pages_skipped;
 	int needed_blocks, ret = 0, nr_to_writebump = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
 
@@ -2411,6 +2411,7 @@ static int ext4_da_writepages(struct address_space *mapping,
 	 */
 	no_nrwrite_index_update = wbc->no_nrwrite_index_update;
 	wbc->no_nrwrite_index_update = 1;
+	pages_skipped = wbc->pages_skipped;
 
 	while (!ret && wbc->nr_to_write > 0) {
 
@@ -2444,6 +2445,7 @@ static int ext4_da_writepages(struct address_space *mapping,
 			 * and try again
 			 */
 			jbd2_journal_force_commit_nested(sbi->s_journal);
+			wbc->pages_skipped = pages_skipeed;
 			ret = 0;
 		} else if (ret == MPAGE_DA_EXTENT_TAIL) {
 			/*
@@ -2451,6 +2453,7 @@ static int ext4_da_writepages(struct address_space *mapping,
 			 * rest of the pages
 			 */
 			pages_written += mpd.pages_written;
+			wbc->pages_skipped = pages_skipeed;
 			ret = 0;
 		} else if (wbc->nr_to_write)
 			/*

WARNING: multiple messages have this Message-ID (diff)

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: cmm@us.ibm.com, tytso@mit.edu, sandeen@redhat.com,
	akpm@linux-foundation.org, hch@infradead.org, steve@chygwyn.com,
	npiggin@suse.de, mpatocka@redhat.com, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH updated] ext4: Fix file fragmentation during large file write.
Date: Thu, 16 Oct 2008 13:20:54 +0530	[thread overview]
Message-ID: <20081016075054.GC19480@skywalker> (raw)
In-Reply-To: <1224114692.6938.48.camel@think.oraclecorp.com>

On Wed, Oct 15, 2008 at 07:51:32PM -0400, Chris Mason wrote:
> On Wed, 2008-10-15 at 16:41 -0400, Chris Mason wrote:
> > On Fri, 2008-10-10 at 23:32 +0530, Aneesh Kumar K.V wrote:
> > > The range_cyclic writeback mode use the address_space
> > > writeback_index as the start index for writeback. With
> > > delayed allocation we were updating writeback_index
> > > wrongly resulting in highly fragmented file. Number of
> > > extents reduced from 4000 to 27 for a 3GB file with
> > > the below patch.
> > > 
> > 
> > I tested the ext4 patch queue from today on top of 2.6.27, and this
> > includes Aneesh's latest patches.
> > 
> > Things are going at disk speed for streaming writes, with the number of
> > extents generated for a 32GB file down to 27.  So, this is definitely an
> > improvement for ext4.
> 
> Just FYI, I ran this with compilebench -i 20 --makej and my log is full
> of these:
> 
> ext4_da_writepages: jbd2_start: 1024 pages, ino 520417; err -30
> Pid: 4072, comm: pdflush Not tainted 2.6.27 #2
> 
> Call Trace:
>  [<ffffffffa0048493>] ext4_da_writepages+0x171/0x2d3 [ext4]
>  [<ffffffff802336be>] ? pick_next_task_fair+0x80/0x91
>  [<ffffffff80228fa8>] ? source_load+0x2a/0x58
>  [<ffffffff8038e499>] ? __next_cpu+0x19/0x26
>  [<ffffffff8026748f>] do_writepages+0x28/0x37
>  [<ffffffff802a6b39>] __writeback_single_inode+0x14f/0x26d
>  [<ffffffff802a6fb7>] generic_sync_sb_inodes+0x1c1/0x2a2
>  [<ffffffff802a70a1>] sync_sb_inodes+0x9/0xb
>  [<ffffffff802a73dc>] writeback_inodes+0x64/0xad
>  [<ffffffff802675db>] wb_kupdate+0x9a/0x10c
>  [<ffffffff80267fd1>] ? pdflush+0x0/0x1e9
>  [<ffffffff80267fd1>] ? pdflush+0x0/0x1e9
>  [<ffffffff8026810e>] pdflush+0x13d/0x1e9
>  [<ffffffff80267541>] ? wb_kupdate+0x0/0x10c
>  [<ffffffff80248222>] kthread+0x49/0x77
>  [<ffffffff8020c5e9>] child_rip+0xa/0x11
>  [<ffffffff802481d9>] ? kthread+0x0/0x77
>  [<ffffffff8020c5df>] ? child_rip+0x0/0x11

I actually did the mount -o remount,ro test before sending out the
patches to see if we are skipping some pages during writeback. I
didn't see the error at that time. Today i tried again on a larger
file system and i am able to reproduce the above stack with -o
remount,ro. The patch below fix it for me. The VFS writeback looks
at pages_skipped and make some decision if the value  increase
during call back(__fynsc_super -> generic_sync_sb_inodes).
So we need to update pages_skipped also. This may not apply on
top of what patches are in patchqueue. I also have other changes
to use single variable no_nrwite_index_update as per Christoph
suggestion. I will send out the patches for patchqueue after 
I look at the compile bench numbers you reported.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c4b9b4..88ce29e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2370,10 +2370,10 @@ static int ext4_da_writepages(struct address_space *mapping,
 	pgoff_t	index;
 	int range_whole = 0;
 	handle_t *handle = NULL;
-	long pages_written = 0;
 	struct mpage_da_data mpd;
 	struct inode *inode = mapping->host;
 	int no_nrwrite_index_update;
+	long pages_written = 0, pages_skipped;
 	int needed_blocks, ret = 0, nr_to_writebump = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
 
@@ -2411,6 +2411,7 @@ static int ext4_da_writepages(struct address_space *mapping,
 	 */
 	no_nrwrite_index_update = wbc->no_nrwrite_index_update;
 	wbc->no_nrwrite_index_update = 1;
+	pages_skipped = wbc->pages_skipped;
 
 	while (!ret && wbc->nr_to_write > 0) {
 
@@ -2444,6 +2445,7 @@ static int ext4_da_writepages(struct address_space *mapping,
 			 * and try again
 			 */
 			jbd2_journal_force_commit_nested(sbi->s_journal);
+			wbc->pages_skipped = pages_skipeed;
 			ret = 0;
 		} else if (ret == MPAGE_DA_EXTENT_TAIL) {
 			/*
@@ -2451,6 +2453,7 @@ static int ext4_da_writepages(struct address_space *mapping,
 			 * rest of the pages
 			 */
 			pages_written += mpd.pages_written;
+			wbc->pages_skipped = pages_skipeed;
 			ret = 0;
 		} else if (wbc->nr_to_write)
 			/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-10-16  7:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-10 18:02 [PATCH updated] ext4: Fix file fragmentation during large file write Aneesh Kumar K.V
2008-10-10 18:02 ` Aneesh Kumar K.V
2008-10-10 18:12 ` Aneesh Kumar K.V
2008-10-10 18:12 ` Aneesh Kumar K.V
2008-10-10 18:12   ` Aneesh Kumar K.V
2008-10-10 19:15 ` Chris Mason
2008-10-11 10:51 ` Nick Piggin
2008-10-11 10:51   ` Nick Piggin
2008-10-11 18:14   ` Aneesh Kumar K.V
2008-10-11 18:14     ` Aneesh Kumar K.V
2008-10-11 19:07     ` Aneesh Kumar K.V
2008-10-11 19:07       ` Aneesh Kumar K.V
2008-10-15 20:41 ` Chris Mason
2008-10-15 23:51   ` Chris Mason
2008-10-16  2:41     ` Theodore Tso
2008-10-16  2:41       ` Theodore Tso
2008-10-16  2:44     ` Andreas Dilger
2008-10-16  2:44       ` Andreas Dilger
2008-10-16  7:50     ` Aneesh Kumar K.V [this message]
2008-10-16  7:50       ` Aneesh Kumar K.V
2008-10-16  9:10     ` Aneesh Kumar K.V
2008-10-16  9:10       ` Aneesh Kumar K.V
2008-10-16 11:13       ` Aneesh Kumar K.V
2008-10-16 11:13         ` Aneesh Kumar K.V
  -- strict thread matches above, loose matches on Subject: below --
2008-10-10 18:02 Aneesh Kumar K.V

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3c4b9b4 dfblob:88ce29e dfblob:3c4b9b4 dfblob:88ce29e )
 OR (
bs:"Re: [PATCH updated] ext4: Fix file fragmentation during large file write." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081016075054.GC19480@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=cmm@us.ibm.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mpatocka@redhat.com \
    --cc=npiggin@suse.de \
    --cc=sandeen@redhat.com \
    --cc=steve@chygwyn.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.