Re: Btrfs v0.16 released

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@oracle.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Btrfs v0.16 released
Date: Fri, 15 Aug 2008 16:37:02 -0400	[thread overview]
Message-ID: <1218832622.19495.14.camel@think.oraclecorp.com> (raw)
In-Reply-To: <20080815195941.GB22395@mit.edu>

On Fri, 2008-08-15 at 15:59 -0400, Theodore Tso wrote:
> On Fri, Aug 15, 2008 at 01:52:52PM -0400, Chris Mason wrote:
> > Have you tried this one:
> > 
> > http://article.gmane.org/gmane.linux.file-systems/25560
> > 
> > This bug should cause fragmentation on small files getting forced out
> > due to memory pressure in ext4.  But, I wasn't able to really
> > demonstrate it with ext4 on my machine.
> 
> I've been able to use compilebench to see the fragmentation problem
> very easily.
> 
> Annesh has been workign on it, and has some fixes that he queued up.
> I'll have to point him at your proposed fix, thanks.  This is what he
> came up with in the common code.  What do you think?
> 

It sounds like ext4 would show the writeback_index bug with
fragmentation on disk and btrfs would show it with seeks during the
benchmark.  I was only watching the throughput numbers and not looking
at filefrag results.

> 						- Ted
> 
> (From Annesh, on the linux-ext4 list.)
> 
> As I explained in my previous patch the problem is due to pdflush
> background_writeout. Now when pdflush does the writeout we may
> have only few pages for the file and we would attempt
> to write them to disk. So my attempt in the last patch was to
> do the below
>

pdflush and delalloc and raid stripe alignment and lots of other things
don't play well together.  In general, I think we need one or more
pdflush threads per mounted FS so that write_cache_pages doesn't have to
bail out every time it hits congestion.

The current write_cache_pages code even misses easy changes to create
bigger bios just because a block device is congested when called by
background_writeout()

But I would hope we can deal with a single threaded small file workload
like compilebench without resorting to big rewrites

> a) When allocation blocks try to be close to the goal block specified
> b) When we call ext4_da_writepages make sure we have minimal nr_to_write
>   that ensures we allocate all dirty buffer_heads in a single go.
>   nr_to_write is set to 1024 in pdflush background_writeout and that
>   would mean we may end up calling some inodes writepages() with really
>   small values even though we have more dirty buffer_heads.
> 
> What it doesn't handle is
> 1) File A have 4 dirty buffer_heads.
> 2) pdflush try to write them. We get 4 contig blocks
> 3) File A now have new 5 dirty_buffer_heads
> 4) File B now have 6 dirty_buffer_heads
> 5) pdflush try to write the 6 dirty buffer_heads of file B and allocate
> them next to earlier file A blocks
> 6) pdflush try to write the 5 dirty buffer_heads of file A and allocate
> them after file B blocks resulting in discontinuity.
> 
> I am right now testing the below patch which make sure new dirty inodes
> are added to the tail of the dirty inode list
> 
> commit 6ad9d25595aea8efa0d45c0a2dd28b4a415e34e6
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date:   Fri Aug 15 23:19:15 2008 +0530
> 
>     move the dirty inodes to the end of the list
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 25adfc3..91f3c54 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -163,7 +163,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  		 */
>  		if (!was_dirty) {
>  			inode->dirtied_when = jiffies;
> -			list_move(&inode->i_list, &sb->s_dirty);
> +			list_move_tail(&inode->i_list, &sb->s_dirty);
>  		}
>  	}
>  out:

Looks like everyone who walks sb->s_io or s_dirty walks it backwards.
This should make the newly dirtied inode the first one to be processed,
which probably isn't what we want.  I could be reading it backwards of
course ;)

-chris

next prev parent reply	other threads:[~2008-08-15 20:37 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-05 19:01 Btrfs v0.16 released Chris Mason
2008-08-07  9:08 ` Peter Zijlstra
2008-08-07 10:34   ` Chris Mason
2008-08-07 14:58     ` Chris Friesen
2008-08-07 15:07     ` tvrtko.ursulin
2008-08-07  9:14 ` Peter Zijlstra
2008-08-07 10:39   ` Chris Mason
     [not found]     ` <3da3b5b40808070703x4cf49471q6acc00351ba019d7@mail.gmail.com>
2008-08-07 14:06       ` Chris Mason
2008-08-07 18:02     ` Andi Kleen
2008-08-08 18:48       ` Chris Mason
2008-08-08 21:56         ` Andi Kleen
2008-08-09  1:19           ` Theodore Tso
2008-08-09  1:23             ` Andi Kleen
     [not found]             ` <20080809012322.GF9038@one.firstfloor.org>
2008-08-09  1:43               ` Theodore Tso
2008-08-14 21:00         ` Chris Mason
2008-08-14 21:17           ` Andi Kleen
2008-08-15  1:25             ` Chris Mason
2008-08-15  1:39               ` Andi Kleen
2008-08-15 13:00                 ` Chris Mason
2008-08-16 19:26                   ` Szabolcs Szakacsits
2008-08-18 13:52                     ` Chris Mason
2008-08-18 17:37                       ` Szabolcs Szakacsits
2008-08-14 23:44           ` Theodore Tso
2008-08-15  1:10             ` Chris Mason
2008-08-15 12:46               ` Chris Mason
2008-08-15 13:45                 ` Theodore Tso
2008-08-15 17:52                   ` Chris Mason
2008-08-15 19:59                     ` Theodore Tso
2008-08-15 20:37                       ` Chris Mason [this message]
2008-08-16 18:10                         ` Chris Mason
2008-08-16 19:27                           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1218832622.19495.14.camel@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=andi@firstfloor.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).