From: Theodore Tso <tytso@mit.edu>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-mm@kvack.org,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, chris.mason@oracle.com,
jens.axboe@oracle.com
Subject: Re: [PATCH, RFC] vm: Add an tuning knob for vm.max_writeback_pages
Date: Sun, 30 Aug 2009 14:17:31 -0400 [thread overview]
Message-ID: <20090830181731.GA20822@mit.edu> (raw)
In-Reply-To: <20090830165229.GA5189@infradead.org>
On Sun, Aug 30, 2009 at 12:52:29PM -0400, Christoph Hellwig wrote:
> On Sat, Aug 29, 2009 at 10:54:18PM -0400, Theodore Ts'o wrote:
> > MAX_WRITEBACK_PAGES was hard-coded to 1024 because of a concern of not
> > holding I_SYNC for too long. But this shouldn't be a concern since
> > I_LOCK and I_SYNC have been separated. So make it be a tunable and
> > change the default to be 32768.
> >
> > This change is helpful for ext4 since it means we write out large file
> > in bigger chunks than just 4 megabytes at a time, so that when we have
> > multiple large files in the page cache waiting for writeback, the
> > files don't end up getting interleaved. There shouldn't be any downside.
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=13930
>
> The current writeback sizes are defintively too small, we shoved in
> a hack into XFS to bump up nr_to_write to four times the value the
> VM sends us to be able to saturate medium sized RAID arrays in XFS.
Hmm, should we make it be a per-superblock tunable so that it can
either be tuned on a per-block device basis or the filesystem code can
adjust it to their liking? I thought about it, but decided maybe it
was better to keeping it simple.
> Turns out this was not enough and at least for Chris Masons array
> we only started seaturating at * 16. I suspect you patch will give
> a similar effect.
So you think 16384 would be a better default? The reason why I picked
32768 was because that was the size of the ext4 block group, but it
was otherwise it was totally arbitrary. I haven't done any
benchmarking yet, which is one of the reasons why I thought about
making it a tunable.
> And btw, I think referring to the historic code in the comment is not
> a good idea, it's just going to ocnfuse the heck out of everyone looking
> at it in the future. The information above makes sense for the commit
> message.
Yeah, good point.
> And the other big question is how this interacts with Jens' new per-bdi
> flushing code that we still hope to merge in 2.6.32.
Jens? What do you think? Fixing MAX_WRITEBACK_PAGES was something I
really wanted to merge in 2.6.32 since it makes a huge difference for
the block allocation layout for a "rsync -avH /old-fs /new-fs" when we
are copying bunch of large files (say, 800 meg iso images) and so the
fact that the writeback routine is writing out 4 megs at a time, means
that our files get horribly interleaved and thus get fragmented.
I initially thought about adding some massive workarounds in the
filesystem layer (which is I guess what XFS did), but I ultimately
decided this was begging to be solved in the page writeback code,
especially since it's *such* an easy fix.
> Maybe we'll actually get some sane writeback code for the first time.
To quote from "Fiddler on the Roof", from your lips to God's ears....
:-)
- Ted
WARNING: multiple messages have this Message-ID (diff)
From: Theodore Tso <tytso@mit.edu>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-mm@kvack.org,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, chris.mason@oracle.com,
jens.axboe@oracle.com
Subject: Re: [PATCH, RFC] vm: Add an tuning knob for vm.max_writeback_pages
Date: Sun, 30 Aug 2009 14:17:31 -0400 [thread overview]
Message-ID: <20090830181731.GA20822@mit.edu> (raw)
In-Reply-To: <20090830165229.GA5189@infradead.org>
On Sun, Aug 30, 2009 at 12:52:29PM -0400, Christoph Hellwig wrote:
> On Sat, Aug 29, 2009 at 10:54:18PM -0400, Theodore Ts'o wrote:
> > MAX_WRITEBACK_PAGES was hard-coded to 1024 because of a concern of not
> > holding I_SYNC for too long. But this shouldn't be a concern since
> > I_LOCK and I_SYNC have been separated. So make it be a tunable and
> > change the default to be 32768.
> >
> > This change is helpful for ext4 since it means we write out large file
> > in bigger chunks than just 4 megabytes at a time, so that when we have
> > multiple large files in the page cache waiting for writeback, the
> > files don't end up getting interleaved. There shouldn't be any downside.
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=13930
>
> The current writeback sizes are defintively too small, we shoved in
> a hack into XFS to bump up nr_to_write to four times the value the
> VM sends us to be able to saturate medium sized RAID arrays in XFS.
Hmm, should we make it be a per-superblock tunable so that it can
either be tuned on a per-block device basis or the filesystem code can
adjust it to their liking? I thought about it, but decided maybe it
was better to keeping it simple.
> Turns out this was not enough and at least for Chris Masons array
> we only started seaturating at * 16. I suspect you patch will give
> a similar effect.
So you think 16384 would be a better default? The reason why I picked
32768 was because that was the size of the ext4 block group, but it
was otherwise it was totally arbitrary. I haven't done any
benchmarking yet, which is one of the reasons why I thought about
making it a tunable.
> And btw, I think referring to the historic code in the comment is not
> a good idea, it's just going to ocnfuse the heck out of everyone looking
> at it in the future. The information above makes sense for the commit
> message.
Yeah, good point.
> And the other big question is how this interacts with Jens' new per-bdi
> flushing code that we still hope to merge in 2.6.32.
Jens? What do you think? Fixing MAX_WRITEBACK_PAGES was something I
really wanted to merge in 2.6.32 since it makes a huge difference for
the block allocation layout for a "rsync -avH /old-fs /new-fs" when we
are copying bunch of large files (say, 800 meg iso images) and so the
fact that the writeback routine is writing out 4 megs at a time, means
that our files get horribly interleaved and thus get fragmented.
I initially thought about adding some massive workarounds in the
filesystem layer (which is I guess what XFS did), but I ultimately
decided this was begging to be solved in the page writeback code,
especially since it's *such* an easy fix.
> Maybe we'll actually get some sane writeback code for the first time.
To quote from "Fiddler on the Roof", from your lips to God's ears....
:-)
- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-30 18:17 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-30 2:54 [PATCH, RFC] vm: Add an tuning knob for vm.max_writeback_pages Theodore Ts'o
2009-08-30 2:54 ` Theodore Ts'o
2009-08-30 16:52 ` Christoph Hellwig
2009-08-30 16:52 ` Christoph Hellwig
2009-08-30 18:17 ` Theodore Tso [this message]
2009-08-30 18:17 ` Theodore Tso
2009-08-30 22:27 ` Christoph Hellwig
2009-08-30 22:27 ` Christoph Hellwig
2009-08-31 3:08 ` Theodore Tso
2009-08-31 3:08 ` Theodore Tso
2009-08-31 10:29 ` Jens Axboe
2009-08-31 10:29 ` Jens Axboe
2009-08-31 10:47 ` Jens Axboe
2009-08-31 10:47 ` Jens Axboe
2009-08-31 12:37 ` Theodore Tso
2009-08-31 12:37 ` Theodore Tso
2009-08-31 15:54 ` Theodore Tso
2009-08-31 20:36 ` Jens Axboe
2009-08-31 21:03 ` Theodore Tso
2009-09-01 7:57 ` Aneesh Kumar K.V
2009-09-01 9:17 ` Jens Axboe
2009-09-01 18:00 ` Chris Mason
2009-09-01 20:30 ` Theodore Tso
2009-09-01 20:30 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090830181731.GA20822@mit.edu \
--to=tytso@mit.edu \
--cc=chris.mason@oracle.com \
--cc=hch@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.